Chapter 4: Dockerfile Build Techniques
Chapter 4: Dockerfile Build Techniques
Learning Objectives
- Master the syntax and instruction usage of Dockerfile
- Learn to write efficient multi-layer image build files
- Understand the image layer caching mechanism and optimization strategies
- Become proficient in using multi-stage builds to reduce image size
Knowledge Points
Dockerfile Basic Concepts
A Dockerfile is a text file that contains a series of instructions for automatically building a Docker image. Each instruction creates a new layer in the image, and the final image is formed by stacking these layers.
Advantages of Dockerfile:
- Version Control: Dockerfiles can be managed with tools like Git.
- Reproducible Builds: Ensures consistent results every time you build.
- Automation: Can be integrated into CI/CD pipelines for automatic builds.
- Transparency: The build process is completely transparent, facilitating debugging and optimization.
Dockerfile Instruction Categories
Category | Instruction | Role |
---|---|---|
Base | FROM, MAINTAINER | Defines the base image and maintainer information |
File Operations | COPY, ADD, VOLUME | File and directory copy operations |
Environment | ENV, ARG, WORKDIR | Sets environment variables and working directory |
Execution | RUN, CMD, ENTRYPOINT | Executes commands and defines startup instructions |
Network | EXPOSE | Declares ports |
User | USER | Sets the running user |
Metadata | LABEL, ONBUILD | Adds labels and trigger instructions |
Build Context
The build context is the set of files and directories sent to the Docker daemon when the docker build
command is executed.
Build Context Example:
project/
├── Dockerfile # Build file
├── .dockerignore # List of files to ignore
├── app/ # Application code
│ ├── main.py
│ └── requirements.txt
├── config/ # Configuration files
│ └── app.conf
└── static/ # Static resources
├── css/
└── js/
Basic Dockerfile Instructions
FROM - Base Image
# Specify the base image
FROM ubuntu:20.04
# Use a specific version
FROM node:16-alpine
# Naming in multi-stage builds
FROM golang:1.19 AS builder
FROM alpine:latest AS runtime
# Use an official image
FROM nginx:latest
FROM mysql:8.0
FROM python:3.9-slim
# Empty image (for minimal images)
FROM scratch
MAINTAINER/LABEL - Metadata Information
# Maintainer information (deprecated, LABEL is recommended)
MAINTAINER John Doe <john@example.com>
# Recommended to use LABEL
LABEL maintainer="john@example.com"
LABEL version="1.0.0"
LABEL description="This is a demo Docker image"
LABEL vendor="My Company"
# Multiple labels
LABEL version="1.0.0" \
description="Multi-line label example" \
maintainer="john@example.com"
RUN - Execute Commands
# Basic usage
RUN apt-get update
RUN apt-get install -y nginx
# Recommended: combine commands to reduce layers
RUN apt-get update && \
apt-get install -y \
nginx \
curl \
vim && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# Use exec form
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "nginx"]
# Multi-line complex command
RUN set -eux; \
apt-get update; \
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
wget; \
rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Create a user
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY and ADD - File Copying
# COPY: Simple file copying (recommended)
COPY app.py /opt/app/
COPY requirements.txt /opt/app/
COPY . /opt/app/
# Copy and set permissions
COPY --chown=appuser:appuser app.py /opt/app/
# ADD: Supports URLs and automatic extraction (more powerful but not recommended for daily use)
ADD https://example.com/file.tar.gz /opt/
ADD archive.tar.gz /opt/ # Automatically extracts
# Copy multiple files
COPY ["file1.txt", "file2.txt", "/opt/app/"]
# Use wildcards
COPY *.py /opt/app/
COPY config/*.conf /etc/myapp/
# Copy directory structure
COPY --from=builder /app/build /usr/share/nginx/html
ENV and ARG - Environment Variables
# ENV: Runtime environment variables
ENV NODE_ENV=production
ENV APP_PORT=3000
ENV DATABASE_URL="postgresql://user:pass@localhost:5432/db"
# Multiple environment variables
ENV NODE_ENV=production \
APP_PORT=3000 \
LOG_LEVEL=info
# ARG: Build-time arguments
ARG VERSION=latest
ARG BUILD_DATE
ARG VCS_REF
# Combined usage
ARG APP_VERSION=1.0.0
ENV APP_VERSION=${APP_VERSION}
# ARG with a default value
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim
# ARG usage example
ARG TARGETPLATFORM
ARG BUILDPLATFORM
RUN echo "Building on $BUILDPLATFORM, targeting $TARGETPLATFORM"
WORKDIR - Working Directory
# Set the working directory
WORKDIR /opt/app
# Automatically create the directory
WORKDIR /path/to/nonexistent/directory
# Relative path (based on the previous WORKDIR)
WORKDIR /opt
WORKDIR app # Equivalent to /opt/app
# Use an environment variable
ENV APP_HOME=/opt/myapp
WORKDIR ${APP_HOME}
# Practical usage example
FROM node:16-alpine
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE - Port Declaration
# Declare a port (for documentation purposes, does not actually open it)
EXPOSE 80
EXPOSE 443
EXPOSE 8080/tcp
EXPOSE 53/udp
# Use a variable
ARG PORT=8080
EXPOSE ${PORT}
# Multiple ports
EXPOSE 80 443 8080
USER - Running User
# Create and use a non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser
# Use UID:GID
USER 1000:1000
# Temporarily switch user
USER root
RUN apt-get update && apt-get install -y some-package
USER appuser
# Practical usage example
FROM node:16-alpine
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
USER nextjs
CMD and ENTRYPOINT - Startup Commands
# CMD: Default startup command (can be overridden)
CMD ["nginx", "-g", "daemon off;"]
CMD nginx -g "daemon off;" # shell form
CMD ["/bin/bash"]
# ENTRYPOINT: Fixed entry point (cannot be overridden)
ENTRYPOINT ["docker-entrypoint.sh"]
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
# Combined usage: ENTRYPOINT + CMD
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8080"] # default arguments
# Practical combined example
FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT ["python", "app.py"]
CMD ["--host", "0.0.0.0", "--port", "8080"]
# Use a script as the entry point
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["docker-entrypoint.sh"]
Practical Application Examples
Python Web App Dockerfile
# Multi-stage build: Python Flask app
FROM python:3.9-slim as base
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc \
libc6-dev && \
rm -rf /var/lib/apt/lists/*
# Create a non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
# Set the working directory
WORKDIR /app
# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY --chown=appuser:appuser . .
# Switch to the non-root user
USER appuser
# Expose the port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
# Startup command
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]
Node.js App Dockerfile
# Node.js app multi-stage build
FROM node:16-alpine AS dependencies
# Set the working directory
WORKDIR /usr/src/app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production && npm cache clean --force
# Build stage
FROM node:16-alpine AS builder
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:16-alpine AS production
# Create a user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
WORKDIR /usr/src/app
# Copy node_modules from the dependencies stage
COPY --from=dependencies --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules
# Copy build artifacts from the builder stage
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/package*.json ./
# Switch user
USER nextjs
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node healthcheck.js
CMD ["npm", "start"]
Nginx Static Site Dockerfile
# Multi-stage build: Frontend app
FROM node:16-alpine AS builder
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm ci
# Build the app
COPY . .
RUN npm run build
# Production image
FROM nginx:alpine
# Copy custom nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf
# Copy build artifacts
COPY --from=builder /app/dist /usr/share/nginx/html
# Copy startup script
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
# Expose the port
EXPOSE 80
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost/ || exit 1
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["nginx", "-g", "daemon off;"]
Multi-stage Builds
Basic Multi-stage Build
# Build stage
FROM golang:1.19 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .
# Runtime stage
FROM alpine:latest
# Install CA certificates
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy the binary from the build stage
COPY --from=builder /app/main .
# Create a non-root user
RUN adduser -D -s /bin/sh appuser
USER appuser
EXPOSE 8080
CMD ["./main"]
Complex Multi-stage Build Example
# Base image
FROM node:16-alpine AS base
WORKDIR /app
COPY package*.json ./
# Dependency installation stage
FROM base AS dependencies
RUN npm ci --only=production
# Dev dependency stage
FROM base AS dev-dependencies
RUN npm ci
# Build stage
FROM dev-dependencies AS builder
COPY . .
RUN npm run build
# Test stage
FROM dev-dependencies AS tester
COPY . .
RUN npm test
# Production image
FROM node:16-alpine AS production
# Security configuration
RUN addgroup -g 1001 -S nodejs && \
adduser -S appuser -u 1001
WORKDIR /app
USER appuser
# Copy production dependencies
COPY --from=dependencies --chown=appuser:nodejs /app/node_modules ./node_modules
# Copy build artifacts
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --chown=appuser:nodejs package*.json ./
EXPOSE 3000
CMD ["npm", "start"]
Build Optimization Techniques
Leveraging the Cache Mechanism
# ❌ Bad practice: Reinstalling dependencies every time
COPY . /app
WORKDIR /app
RUN npm install
# ✅ Good practice: Copy dependency files first to leverage the cache
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
Reducing the Number of Image Layers
# ❌ Creates multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN rm -rf /var/lib/apt/lists/*
# ✅ Combines into one layer
RUN apt-get update && \
apt-get install -y \
curl \
wget && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
Using .dockerignore
# .dockerignore file content
.git
.gitignore
README.md
Dockerfile
.dockerignore
node_modules
npm-debug.log
.nyc_output
.coverage
.env.local
.env.development.local
.env.test.local
.env.production.local
.next
.vscode
__pycache__
*.pyc
*.pyo
*.pyd
Choosing the Right Base Image
# Size comparison
FROM ubuntu:20.04 # ~72MB
FROM python:3.9-slim # ~45MB
FROM python:3.9-alpine # ~15MB
FROM scratch # 0MB
# Choose based on needs
FROM alpine:latest # Minimal production image
FROM ubuntu:20.04 # Development image needing more tools
FROM python:3.9-alpine # A balanced choice for Python apps
Build Arguments and Variables
Using Build Arguments
# Define build arguments
ARG NODE_VERSION=16
ARG APP_ENV=production
FROM node:${NODE_VERSION}-alpine
# Use build arguments
ARG BUILD_DATE
ARG VCS_REF
ARG VERSION
# Set labels
LABEL build_date=${BUILD_DATE}
LABEL vcs_ref=${VCS_REF}
LABEL version=${VERSION}
# Convert to environment variables
ENV APP_ENV=${APP_ENV}
Passing arguments during build:
# Pass build arguments
docker build \
--build-arg NODE_VERSION=18 \
--build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
--build-arg VCS_REF=$(git rev-parse HEAD) \
--build-arg VERSION=1.2.0 \
-t myapp:1.2.0 .
# View build arguments
docker inspect myapp:1.2.0 | jq '.[0].Config.Labels'
Conditional Builds
ARG INSTALL_DEV_TOOLS=false
# Conditionally install development tools
RUN if [ "$INSTALL_DEV_TOOLS" = "true" ]; then \
apt-get update && \
apt-get install -y vim curl wget; \
fi
Practical Project Build
Complete Web App Build
Create the project structure:
mkdir docker-webapp-build
cd docker-webapp-build
# Create application code
cat > app.py << 'EOF'
from flask import Flask, jsonify
import os
import sys
import logging
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
@app.route('/')
def home():
return jsonify({
"message": "Hello from Dockerized Flask App!",
"version": os.getenv("APP_VERSION", "unknown"),
"environment": os.getenv("APP_ENV", "development")
})
@app.route('/health')
def health():
return jsonify({"status": "healthy"})
@app.route('/info')
def info():
return jsonify({
"python_version": sys.version,
"environment": dict(os.environ)
})
if __name__ == '__main__':
port = int(os.getenv("PORT", 5000))
app.run(host='0.0.0.0', port=port, debug=False)
EOF
# Create dependency file
cat > requirements.txt << 'EOF'
Flask==2.3.2
gunicorn==21.2.0
EOF
# Create configuration file
cat > config.py << 'EOF'
import os
class Config:
SECRET_KEY = os.environ.get('SECRET_KEY') or 'dev-secret-key'
DEBUG = os.environ.get('FLASK_DEBUG', 'False').lower() == 'true'
PORT = int(os.environ.get('PORT', 5000))
EOF
# Create startup script
cat > docker-entrypoint.sh << 'EOF'
#!/bin/sh
set -e
echo "Starting Flask application..."
echo "Environment: ${APP_ENV:-development}"
echo "Version: ${APP_VERSION:-unknown}"
# Database migrations and other initialization can be done here
# python -c "from app import db; db.create_all()"
exec "$@"
EOF
chmod +x docker-entrypoint.sh
Create an optimized Dockerfile:
# Multi-stage build for a complete Flask app
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim as base
# Build arguments
ARG BUILD_DATE
ARG VCS_REF
ARG VERSION=dev
# Label information
LABEL maintainer="developer@example.com" \
build_date="${BUILD_DATE}" \
vcs_ref="${VCS_REF}" \
version="${VERSION}" \
description="Flask web application with Docker"
# Python environment configuration
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_DEFAULT_TIMEOUT=100
# Dependency installation stage
FROM base AS dependencies
# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc \
libc6-dev \
curl && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# Create an application user
RUN groupadd -r appuser && \
useradd -r -g appuser -d /app -s /bin/bash appuser
# Set the working directory
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --user --no-warn-script-location -r requirements.txt
# Production image
FROM base AS production
# Copy user configuration
RUN groupadd -r appuser && \
useradd -r -g appuser -d /app -s /bin/bash appuser
# Install runtime dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
ca-certificates && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
WORKDIR /app
# Copy Python dependencies
COPY --from=dependencies --chown=appuser:appuser /root/.local /home/appuser/.local
# Copy application code
COPY --chown=appuser:appuser . .
# Make the script executable
RUN chmod +x docker-entrypoint.sh
# Switch to the application user
USER appuser
# Set PATH to include user-installed packages
ENV PATH=/home/appuser/.local/bin:$PATH
# Set application environment variables
ENV APP_VERSION=${VERSION} \
APP_ENV=production \
PORT=5000
# Expose the port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
# Startup configuration
ENTRYPOINT ["./docker-entrypoint.sh"]
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "--timeout", "120", "app:app"]
Create a .dockerignore file:
.git
.gitignore
README.md
Dockerfile
.dockerignore
__pycache__
*.pyc
*.pyo
*.pyd
.Python
.pytest_cache
.coverage
.venv
virtualenv/
.env
.env.local
.env.development.local
.env.test.local
.env.production.local
Build and test:
# Build the image
docker build \
--build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg VERSION=1.0.0 \
-t flask-app:1.0.0 \
-t flask-app:latest .
# View image information
docker images flask-app
docker inspect flask-app:latest | jq '.[0].Config.Labels'
# Run the container
docker run -d --name flask-app \
-p 5000:5000 \
-e APP_ENV=production \
flask-app:latest
# Test the application
curl http://localhost:5000/
curl http://localhost:5000/health
curl http://localhost:5000/info
# View logs
docker logs -f flask-app
# Clean up
docker stop flask-app
docker rm flask-app
Dockerfile Best Practices
- Single Responsibility: Each container should run only one service.
- Minimize Layers: Combine related RUN instructions to reduce the number of layers.
- Leverage Cache: Place less frequently changing instructions at the top.
- Security Configuration: Run applications as a non-root user.
- Health Checks: Add health checks to ensure service availability.
- Tag Management: Add appropriate labels and metadata to images.
Important Notes
- Avoid including sensitive information (passwords, keys, etc.) in the image.
- Use multi-stage builds to reduce the final image size.
- Regularly update base images to get security patches.
- Use .dockerignore to exclude unnecessary files.
- Pin base image versions in production environments.
Summary
By completing this chapter, you should have mastered:
- Dockerfile Syntax: Proficient use of various Dockerfile instructions.
- Build Techniques: Mastered multi-stage builds and image optimization methods.
- Best Practices: Understood best practices for security, performance, and maintainability.
- Practical Skills: Able to write high-quality Dockerfiles for different types of applications.
In the next chapter, we will learn about Docker volumes and network configuration to solve container data persistence and network communication issues.