Chapter 4: Dockerfile Build Technology
Haiyue
20min
Chapter 4: Dockerfile Build Technology
Learning Objectives
- Master Dockerfile syntax and instruction usage
- Learn to write efficient multi-layer image build files
- Understand image layer caching mechanisms and optimization strategies
- Become proficient in using multi-stage builds to reduce image size
Key Concepts
Dockerfile Basics
A Dockerfile is a text file containing a series of instructions used to automate Docker image building. Each instruction creates a new layer in the image, and layers stack to form the final image.
Advantages of Dockerfile:
- Version Control: Dockerfiles can be managed with Git and other tools
- Reproducible Builds: Ensures consistent results with each build
- Automation: Can be integrated into CI/CD pipelines for automatic builds
- Transparency: Build process is fully transparent, easy to debug and optimize
Dockerfile Instruction Categories
| Category | Instructions | Purpose |
|---|---|---|
| Base Instructions | FROM, MAINTAINER | Define base image and maintainer information |
| File Operations | COPY, ADD, VOLUME | Copy files and directory operations |
| Environment Configuration | ENV, ARG, WORKDIR | Set environment variables and working directory |
| Execution Instructions | RUN, CMD, ENTRYPOINT | Execute commands and define startup instructions |
| Network Configuration | EXPOSE | Declare ports |
| User Management | USER | Set running user |
| Metadata | LABEL, ONBUILD | Add labels and trigger instructions |
Build Context
Build context refers to the files and directories sent to the Docker daemon when executing the docker build command.
Build Context Example:
project/
├── Dockerfile # Build file
├── .dockerignore # Ignore file list
├── app/ # Application code
│ ├── main.py
│ └── requirements.txt
├── config/ # Configuration files
│ └── app.conf
└── static/ # Static resources
├── css/
└── js/
Basic Dockerfile Instructions
FROM - Base Image
# Specify base image
FROM ubuntu:20.04
# Use specific version
FROM node:16-alpine
# Naming in multi-stage builds
FROM golang:1.19 AS builder
FROM alpine:latest AS runtime
# Use official images
FROM nginx:latest
FROM mysql:8.0
FROM python:3.9-slim
# Empty image (minimal image)
FROM scratch
MAINTAINER/LABEL - Metadata Information
# Maintainer information (deprecated, use LABEL instead)
MAINTAINER John Doe <john@example.com>
# Recommended to use LABEL
LABEL maintainer="john@example.com"
LABEL version="1.0.0"
LABEL description="This is a demo Docker image"
LABEL vendor="My Company"
# Multiple labels
LABEL version="1.0.0" \
description="Multi-line label example" \
maintainer="john@example.com"
RUN - Execute Commands
# Basic usage
RUN apt-get update
RUN apt-get install -y nginx
# Recommended: Combine commands to reduce layers
RUN apt-get update && \
apt-get install -y \
nginx \
curl \
vim && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# Use exec form
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "nginx"]
# Multi-line complex commands
RUN set -eux; \
apt-get update; \
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
wget; \
rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Create user
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY and ADD - File Copying
# COPY: Simple file copying (recommended)
COPY app.py /opt/app/
COPY requirements.txt /opt/app/
COPY . /opt/app/
# Copy and set permissions
COPY --chown=appuser:appuser app.py /opt/app/
# ADD: Supports URLs and automatic extraction (more features but not recommended for daily use)
ADD https://example.com/file.tar.gz /opt/
ADD archive.tar.gz /opt/ # Auto-extract
# Copy multiple files
COPY ["file1.txt", "file2.txt", "/opt/app/"]
# Use wildcards
COPY *.py /opt/app/
COPY config/*.conf /etc/myapp/
# Copy directory structure
COPY --from=builder /app/build /usr/share/nginx/html
ENV and ARG - Environment Variables
# ENV: Runtime environment variables
ENV NODE_ENV=production
ENV APP_PORT=3000
ENV DATABASE_URL="postgresql://user:pass@localhost:5432/db"
# Multiple environment variables
ENV NODE_ENV=production \
APP_PORT=3000 \
LOG_LEVEL=info
# ARG: Build-time arguments
ARG VERSION=latest
ARG BUILD_DATE
ARG VCS_REF
# Combined use
ARG APP_VERSION=1.0.0
ENV APP_VERSION=${APP_VERSION}
# ARG with default value
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim
# Using ARG example
ARG TARGETPLATFORM
ARG BUILDPLATFORM
RUN echo "Building on $BUILDPLATFORM, targeting $TARGETPLATFORM"
WORKDIR - Working Directory
# Set working directory
WORKDIR /opt/app
# Automatically create directory
WORKDIR /path/to/nonexistent/directory
# Relative path (based on previous WORKDIR)
WORKDIR /opt
WORKDIR app # Equivalent to /opt/app
# Use environment variables
ENV APP_HOME=/opt/myapp
WORKDIR ${APP_HOME}
# Practical usage example
FROM node:16-alpine
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE - Port Declaration
# Declare ports (documentation only, doesn't actually open ports)
EXPOSE 80
EXPOSE 443
EXPOSE 8080/tcp
EXPOSE 53/udp
# Use variables
ARG PORT=8080
EXPOSE ${PORT}
# Multiple ports
EXPOSE 80 443 8080
USER - Running User
# Create and use non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser
# Use UID:GID
USER 1000:1000
# Temporarily switch users
USER root
RUN apt-get update && apt-get install -y some-package
USER appuser
# Practical usage example
FROM node:16-alpine
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
USER nextjs
CMD and ENTRYPOINT - Startup Commands
# CMD: Default startup command (can be overridden)
CMD ["nginx", "-g", "daemon off;"]
CMD nginx -g "daemon off;" # Shell form
CMD ["/bin/bash"] # Interactive shell
# ENTRYPOINT: Fixed entry point (cannot be overridden)
ENTRYPOINT ["docker-entrypoint.sh"]
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
# Combined use: ENTRYPOINT + CMD
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8080"] # Default parameters
# Practical combined example
FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT ["python", "app.py"]
CMD ["--host", "0.0.0.0", "--port", "8080"]
# Use script as entry point
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["docker-entrypoint.sh"]
Practical Application Examples
Python Web Application Dockerfile
# Multi-stage build: Python Flask application
FROM python:3.9-slim as base
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc \
libc6-dev && \
rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
# Set working directory
WORKDIR /app
# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY --chown=appuser:appuser . .
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
# Startup command
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]
Node.js Application Dockerfile
# Node.js application multi-stage build
FROM node:16-alpine AS dependencies
# Set working directory
WORKDIR /usr/src/app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production && npm cache clean --force
# Build stage
FROM node:16-alpine AS builder
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:16-alpine AS production
# Create user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
WORKDIR /usr/src/app
# Copy node_modules from dependencies stage
COPY --from=dependencies --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules
# Copy build results from builder stage
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/package*.json ./
# Switch user
USER nextjs
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node healthcheck.js
CMD ["npm", "start"]
Nginx Static Website Dockerfile
# Multi-stage build: Frontend application
FROM node:16-alpine AS builder
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm ci
# Build application
COPY . .
RUN npm run build
# Production image
FROM nginx:alpine
# Copy custom nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf
# Copy build artifacts
COPY --from=builder /app/dist /usr/share/nginx/html
# Copy startup script
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
# Expose port
EXPOSE 80
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost/ || exit 1
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["nginx", "-g", "daemon off;"]
Multi-Stage Builds
Basic Multi-Stage Build
# Build stage
FROM golang:1.19 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .
# Runtime stage
FROM alpine:latest
# Install CA certificates
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy binary from build stage
COPY --from=builder /app/main .
# Create non-root user
RUN adduser -D -s /bin/sh appuser
USER appuser
EXPOSE 8080
CMD ["./main"]
Complex Multi-Stage Build Example
# Base image
FROM node:16-alpine AS base
WORKDIR /app
COPY package*.json ./
# Dependency installation stage
FROM base AS dependencies
RUN npm ci --only=production
# Development dependency stage
FROM base AS dev-dependencies
RUN npm ci
# Build stage
FROM dev-dependencies AS builder
COPY . .
RUN npm run build
# Test stage
FROM dev-dependencies AS tester
COPY . .
RUN npm test
# Production image
FROM node:16-alpine AS production
# Security configuration
RUN addgroup -g 1001 -S nodejs && \
adduser -S appuser -u 1001
WORKDIR /app
USER appuser
# Copy production dependencies
COPY --from=dependencies --chown=appuser:nodejs /app/node_modules ./node_modules
# Copy build artifacts
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --chown=appuser:nodejs package*.json ./
EXPOSE 3000
CMD ["npm", "start"]
Build Optimization Techniques
Utilizing Cache Mechanisms
# Bad practice: Reinstall dependencies every time
COPY . /app
WORKDIR /app
RUN npm install
# Good practice: Copy dependency files first, use cache
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
Reducing Image Layers
# Bad: Create multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN rm -rf /var/lib/apt/lists/*
# Good: Combine into one layer
RUN apt-get update && \
apt-get install -y \
curl \
wget && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
Using .dockerignore
# .dockerignore file content
.git
.gitignore
README.md
Dockerfile
.dockerignore
node_modules
npm-debug.log
.nyc_output
.coverage
.env.local
.env.development.local
.env.test.local
.env.production.local
.next
.vscode
__pycache__
*.pyc
*.pyo
*.pyd
Choosing the Right Base Image
# Size comparison
FROM ubuntu:20.04 # ~72MB
FROM python:3.9-slim # ~45MB
FROM python:3.9-alpine # ~15MB
FROM scratch # 0MB
# Choose based on requirements
FROM alpine:latest # Minimal production image
FROM ubuntu:20.04 # Development image needing more tools
FROM python:3.9-alpine # Balanced choice for Python apps
Build Arguments and Variables
Using Build Arguments
# Define build arguments
ARG NODE_VERSION=16
ARG APP_ENV=production
FROM node:${NODE_VERSION}-alpine
# Use build arguments
ARG BUILD_DATE
ARG VCS_REF
ARG VERSION
# Set labels
LABEL build_date=${BUILD_DATE}
LABEL vcs_ref=${VCS_REF}
LABEL version=${VERSION}
# Convert to environment variables
ENV APP_ENV=${APP_ENV}
Pass parameters during build:
# Pass build arguments
docker build \
--build-arg NODE_VERSION=18 \
--build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
--build-arg VCS_REF=$(git rev-parse HEAD) \
--build-arg VERSION=1.2.0 \
-t myapp:1.2.0 .
# View build arguments
docker inspect myapp:1.2.0 | jq '.[0].Config.Labels'
Conditional Builds
ARG INSTALL_DEV_TOOLS=false
# Conditionally install development tools
RUN if [ "$INSTALL_DEV_TOOLS" = "true" ]; then \
apt-get update && \
apt-get install -y vim curl wget; \
fi
Practical Project Build
Complete Web Application Build
Create project structure:
mkdir docker-webapp-build
cd docker-webapp-build
# Create application code
cat > app.py << 'EOF'
from flask import Flask, jsonify
import os
import sys
import logging
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
@app.route('/')
def home():
return jsonify({
"message": "Hello from Dockerized Flask App!",
"version": os.getenv("APP_VERSION", "unknown"),
"environment": os.getenv("APP_ENV", "development")
})
@app.route('/health')
def health():
return jsonify({"status": "healthy"})
@app.route('/info')
def info():
return jsonify({
"python_version": sys.version,
"environment": dict(os.environ)
})
if __name__ == '__main__':
port = int(os.getenv("PORT", 5000))
app.run(host='0.0.0.0', port=port, debug=False)
EOF
# Create dependencies file
cat > requirements.txt << 'EOF'
Flask==2.3.2
gunicorn==21.2.0
EOF
# Create configuration file
cat > config.py << 'EOF'
import os
class Config:
SECRET_KEY = os.environ.get('SECRET_KEY') or 'dev-secret-key'
DEBUG = os.environ.get('FLASK_DEBUG', 'False').lower() == 'true'
PORT = int(os.environ.get('PORT', 5000))
EOF
# Create startup script
cat > docker-entrypoint.sh << 'EOF'
#!/bin/sh
set -e
echo "Starting Flask application..."
echo "Environment: ${APP_ENV:-development}"
echo "Version: ${APP_VERSION:-unknown}"
# Database migrations and other initialization can be done here
# python -c "from app import db; db.create_all()"
exec "$@"
EOF
chmod +x docker-entrypoint.sh
Create optimized Dockerfile:
# Multi-stage build complete Flask application
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim as base
# Build arguments
ARG BUILD_DATE
ARG VCS_REF
ARG VERSION=dev
# Label information
LABEL maintainer="developer@example.com" \
build_date="${BUILD_DATE}" \
vcs_ref="${VCS_REF}" \
version="${VERSION}" \
description="Flask web application with Docker"
# Python environment configuration
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_DEFAULT_TIMEOUT=100
# Dependency installation stage
FROM base AS dependencies
# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc \
libc6-dev \
curl && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# Create application user
RUN groupadd -r appuser && \
useradd -r -g appuser -d /app -s /bin/bash appuser
# Set working directory
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --user --no-warn-script-location -r requirements.txt
# Production image
FROM base AS production
# Copy user configuration
RUN groupadd -r appuser && \
useradd -r -g appuser -d /app -s /bin/bash appuser
# Install runtime dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
ca-certificates && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
WORKDIR /app
# Copy Python dependencies
COPY --from=dependencies --chown=appuser:appuser /root/.local /home/appuser/.local
# Copy application code
COPY --chown=appuser:appuser . .
# Ensure script is executable
RUN chmod +x docker-entrypoint.sh
# Switch to application user
USER appuser
# Set PATH to include user-installed packages
ENV PATH=/home/appuser/.local/bin:$PATH
# Set application environment variables
ENV APP_VERSION=${VERSION} \
APP_ENV=production \
PORT=5000
# Expose port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
# Startup configuration
ENTRYPOINT ["./docker-entrypoint.sh"]
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "--timeout", "120", "app:app"]
Create .dockerignore file:
.git
.gitignore
README.md
Dockerfile
.dockerignore
__pycache__
*.pyc
*.pyo
*.pyd
.Python
.pytest_cache
.coverage
.venv
venv/
.env
.env.local
.env.development.local
.env.test.local
.env.production.local
Build and test:
# Build image
docker build \
--build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg VERSION=1.0.0 \
-t flask-app:1.0.0 \
-t flask-app:latest .
# View image information
docker images flask-app
docker inspect flask-app:latest | jq '.[0].Config.Labels'
# Run container
docker run -d --name flask-app \
-p 5000:5000 \
-e APP_ENV=production \
flask-app:latest
# Test application
curl http://localhost:5000/
curl http://localhost:5000/health
curl http://localhost:5000/info
# View logs
docker logs -f flask-app
# Cleanup
docker stop flask-app
docker rm flask-app
Dockerfile Best Practices
- Single Responsibility: Each container should run only one service
- Minimize Layers: Combine related RUN instructions to reduce layers
- Utilize Caching: Place less frequently changing instructions earlier
- Security Configuration: Use non-root users to run applications
- Health Checks: Add health checks to ensure service availability
- Tag Management: Add appropriate tags and metadata to images
Important Notes
- Avoid including sensitive information (passwords, keys, etc.) in images
- Use multi-stage builds to reduce final image size
- Regularly update base images for security patches
- Use .dockerignore to exclude unnecessary files
- Pin base image versions in production environments
Summary
Through this chapter, you should have mastered:
- Dockerfile Syntax: Proficiency in using various Dockerfile instructions
- Build Techniques: Mastering multi-stage builds and image optimization methods
- Best Practices: Understanding security, performance, and maintainability best practices
- Practical Skills: Ability to write high-quality Dockerfiles for different types of applications
In the next chapter, we’ll learn about Docker volumes and network configuration, addressing container data persistence and network communication issues.