Chapter 4: Dockerfile Build Technology

Haiyue
20min

Chapter 4: Dockerfile Build Technology

Learning Objectives
  • Master Dockerfile syntax and instruction usage
  • Learn to write efficient multi-layer image build files
  • Understand image layer caching mechanisms and optimization strategies
  • Become proficient in using multi-stage builds to reduce image size

Key Concepts

Dockerfile Basics

A Dockerfile is a text file containing a series of instructions used to automate Docker image building. Each instruction creates a new layer in the image, and layers stack to form the final image.

Advantages of Dockerfile:

  • Version Control: Dockerfiles can be managed with Git and other tools
  • Reproducible Builds: Ensures consistent results with each build
  • Automation: Can be integrated into CI/CD pipelines for automatic builds
  • Transparency: Build process is fully transparent, easy to debug and optimize

Dockerfile Instruction Categories

CategoryInstructionsPurpose
Base InstructionsFROM, MAINTAINERDefine base image and maintainer information
File OperationsCOPY, ADD, VOLUMECopy files and directory operations
Environment ConfigurationENV, ARG, WORKDIRSet environment variables and working directory
Execution InstructionsRUN, CMD, ENTRYPOINTExecute commands and define startup instructions
Network ConfigurationEXPOSEDeclare ports
User ManagementUSERSet running user
MetadataLABEL, ONBUILDAdd labels and trigger instructions

Build Context

Build context refers to the files and directories sent to the Docker daemon when executing the docker build command.

Build Context Example:
project/
├── Dockerfile          # Build file
├── .dockerignore       # Ignore file list
├── app/                # Application code
│   ├── main.py
│   └── requirements.txt
├── config/             # Configuration files
│   └── app.conf
└── static/             # Static resources
    ├── css/
    └── js/

Basic Dockerfile Instructions

FROM - Base Image

# Specify base image
FROM ubuntu:20.04

# Use specific version
FROM node:16-alpine

# Naming in multi-stage builds
FROM golang:1.19 AS builder
FROM alpine:latest AS runtime

# Use official images
FROM nginx:latest
FROM mysql:8.0
FROM python:3.9-slim

# Empty image (minimal image)
FROM scratch

MAINTAINER/LABEL - Metadata Information

# Maintainer information (deprecated, use LABEL instead)
MAINTAINER John Doe <john@example.com>

# Recommended to use LABEL
LABEL maintainer="john@example.com"
LABEL version="1.0.0"
LABEL description="This is a demo Docker image"
LABEL vendor="My Company"

# Multiple labels
LABEL version="1.0.0" \
      description="Multi-line label example" \
      maintainer="john@example.com"

RUN - Execute Commands

# Basic usage
RUN apt-get update
RUN apt-get install -y nginx

# Recommended: Combine commands to reduce layers
RUN apt-get update && \
    apt-get install -y \
        nginx \
        curl \
        vim && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

# Use exec form
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "nginx"]

# Multi-line complex commands
RUN set -eux; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
        ca-certificates \
        curl \
        wget; \
    rm -rf /var/lib/apt/lists/*

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Create user
RUN groupadd -r appuser && useradd -r -g appuser appuser

COPY and ADD - File Copying

# COPY: Simple file copying (recommended)
COPY app.py /opt/app/
COPY requirements.txt /opt/app/
COPY . /opt/app/

# Copy and set permissions
COPY --chown=appuser:appuser app.py /opt/app/

# ADD: Supports URLs and automatic extraction (more features but not recommended for daily use)
ADD https://example.com/file.tar.gz /opt/
ADD archive.tar.gz /opt/  # Auto-extract

# Copy multiple files
COPY ["file1.txt", "file2.txt", "/opt/app/"]

# Use wildcards
COPY *.py /opt/app/
COPY config/*.conf /etc/myapp/

# Copy directory structure
COPY --from=builder /app/build /usr/share/nginx/html

ENV and ARG - Environment Variables

# ENV: Runtime environment variables
ENV NODE_ENV=production
ENV APP_PORT=3000
ENV DATABASE_URL="postgresql://user:pass@localhost:5432/db"

# Multiple environment variables
ENV NODE_ENV=production \
    APP_PORT=3000 \
    LOG_LEVEL=info

# ARG: Build-time arguments
ARG VERSION=latest
ARG BUILD_DATE
ARG VCS_REF

# Combined use
ARG APP_VERSION=1.0.0
ENV APP_VERSION=${APP_VERSION}

# ARG with default value
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim

# Using ARG example
ARG TARGETPLATFORM
ARG BUILDPLATFORM
RUN echo "Building on $BUILDPLATFORM, targeting $TARGETPLATFORM"

WORKDIR - Working Directory

# Set working directory
WORKDIR /opt/app

# Automatically create directory
WORKDIR /path/to/nonexistent/directory

# Relative path (based on previous WORKDIR)
WORKDIR /opt
WORKDIR app  # Equivalent to /opt/app

# Use environment variables
ENV APP_HOME=/opt/myapp
WORKDIR ${APP_HOME}

# Practical usage example
FROM node:16-alpine
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

EXPOSE - Port Declaration

# Declare ports (documentation only, doesn't actually open ports)
EXPOSE 80
EXPOSE 443
EXPOSE 8080/tcp
EXPOSE 53/udp

# Use variables
ARG PORT=8080
EXPOSE ${PORT}

# Multiple ports
EXPOSE 80 443 8080

USER - Running User

# Create and use non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

# Use UID:GID
USER 1000:1000

# Temporarily switch users
USER root
RUN apt-get update && apt-get install -y some-package
USER appuser

# Practical usage example
FROM node:16-alpine
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
USER nextjs

CMD and ENTRYPOINT - Startup Commands

# CMD: Default startup command (can be overridden)
CMD ["nginx", "-g", "daemon off;"]
CMD nginx -g "daemon off;"  # Shell form
CMD ["/bin/bash"]  # Interactive shell

# ENTRYPOINT: Fixed entry point (cannot be overridden)
ENTRYPOINT ["docker-entrypoint.sh"]
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]

# Combined use: ENTRYPOINT + CMD
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8080"]  # Default parameters

# Practical combined example
FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT ["python", "app.py"]
CMD ["--host", "0.0.0.0", "--port", "8080"]

# Use script as entry point
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["docker-entrypoint.sh"]

Practical Application Examples

Python Web Application Dockerfile

# Multi-stage build: Python Flask application
FROM python:3.9-slim as base

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        gcc \
        libc6-dev && \
    rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser

# Set working directory
WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY --chown=appuser:appuser . .

# Switch to non-root user
USER appuser

# Expose port
EXPOSE 5000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:5000/health || exit 1

# Startup command
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]

Node.js Application Dockerfile

# Node.js application multi-stage build
FROM node:16-alpine AS dependencies

# Set working directory
WORKDIR /usr/src/app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production && npm cache clean --force

# Build stage
FROM node:16-alpine AS builder

WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:16-alpine AS production

# Create user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001

WORKDIR /usr/src/app

# Copy node_modules from dependencies stage
COPY --from=dependencies --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules

# Copy build results from builder stage
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/package*.json ./

# Switch user
USER nextjs

EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD node healthcheck.js

CMD ["npm", "start"]

Nginx Static Website Dockerfile

# Multi-stage build: Frontend application
FROM node:16-alpine AS builder

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci

# Build application
COPY . .
RUN npm run build

# Production image
FROM nginx:alpine

# Copy custom nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf

# Copy build artifacts
COPY --from=builder /app/dist /usr/share/nginx/html

# Copy startup script
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh

# Expose port
EXPOSE 80

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost/ || exit 1

ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["nginx", "-g", "daemon off;"]

Multi-Stage Builds

Basic Multi-Stage Build

# Build stage
FROM golang:1.19 AS builder

WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .

# Runtime stage
FROM alpine:latest

# Install CA certificates
RUN apk --no-cache add ca-certificates
WORKDIR /root/

# Copy binary from build stage
COPY --from=builder /app/main .

# Create non-root user
RUN adduser -D -s /bin/sh appuser
USER appuser

EXPOSE 8080
CMD ["./main"]

Complex Multi-Stage Build Example

# Base image
FROM node:16-alpine AS base
WORKDIR /app
COPY package*.json ./

# Dependency installation stage
FROM base AS dependencies
RUN npm ci --only=production

# Development dependency stage
FROM base AS dev-dependencies
RUN npm ci

# Build stage
FROM dev-dependencies AS builder
COPY . .
RUN npm run build

# Test stage
FROM dev-dependencies AS tester
COPY . .
RUN npm test

# Production image
FROM node:16-alpine AS production

# Security configuration
RUN addgroup -g 1001 -S nodejs && \
    adduser -S appuser -u 1001

WORKDIR /app
USER appuser

# Copy production dependencies
COPY --from=dependencies --chown=appuser:nodejs /app/node_modules ./node_modules

# Copy build artifacts
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --chown=appuser:nodejs package*.json ./

EXPOSE 3000
CMD ["npm", "start"]

Build Optimization Techniques

Utilizing Cache Mechanisms

# Bad practice: Reinstall dependencies every time
COPY . /app
WORKDIR /app
RUN npm install

# Good practice: Copy dependency files first, use cache
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

Reducing Image Layers

# Bad: Create multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN rm -rf /var/lib/apt/lists/*

# Good: Combine into one layer
RUN apt-get update && \
    apt-get install -y \
        curl \
        wget && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

Using .dockerignore

# .dockerignore file content
.git
.gitignore
README.md
Dockerfile
.dockerignore
node_modules
npm-debug.log
.nyc_output
.coverage
.env.local
.env.development.local
.env.test.local
.env.production.local
.next
.vscode
__pycache__
*.pyc
*.pyo
*.pyd

Choosing the Right Base Image

# Size comparison
FROM ubuntu:20.04        # ~72MB
FROM python:3.9-slim     # ~45MB
FROM python:3.9-alpine   # ~15MB
FROM scratch             # 0MB

# Choose based on requirements
FROM alpine:latest       # Minimal production image
FROM ubuntu:20.04        # Development image needing more tools
FROM python:3.9-alpine   # Balanced choice for Python apps

Build Arguments and Variables

Using Build Arguments

# Define build arguments
ARG NODE_VERSION=16
ARG APP_ENV=production

FROM node:${NODE_VERSION}-alpine

# Use build arguments
ARG BUILD_DATE
ARG VCS_REF
ARG VERSION

# Set labels
LABEL build_date=${BUILD_DATE}
LABEL vcs_ref=${VCS_REF}
LABEL version=${VERSION}

# Convert to environment variables
ENV APP_ENV=${APP_ENV}

Pass parameters during build:

# Pass build arguments
docker build \
  --build-arg NODE_VERSION=18 \
  --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
  --build-arg VCS_REF=$(git rev-parse HEAD) \
  --build-arg VERSION=1.2.0 \
  -t myapp:1.2.0 .

# View build arguments
docker inspect myapp:1.2.0 | jq '.[0].Config.Labels'

Conditional Builds

ARG INSTALL_DEV_TOOLS=false

# Conditionally install development tools
RUN if [ "$INSTALL_DEV_TOOLS" = "true" ]; then \
        apt-get update && \
        apt-get install -y vim curl wget; \
    fi

Practical Project Build

Complete Web Application Build

Create project structure:

mkdir docker-webapp-build
cd docker-webapp-build

# Create application code
cat > app.py << 'EOF'
from flask import Flask, jsonify
import os
import sys
import logging

app = Flask(__name__)
logging.basicConfig(level=logging.INFO)

@app.route('/')
def home():
    return jsonify({
        "message": "Hello from Dockerized Flask App!",
        "version": os.getenv("APP_VERSION", "unknown"),
        "environment": os.getenv("APP_ENV", "development")
    })

@app.route('/health')
def health():
    return jsonify({"status": "healthy"})

@app.route('/info')
def info():
    return jsonify({
        "python_version": sys.version,
        "environment": dict(os.environ)
    })

if __name__ == '__main__':
    port = int(os.getenv("PORT", 5000))
    app.run(host='0.0.0.0', port=port, debug=False)
EOF

# Create dependencies file
cat > requirements.txt << 'EOF'
Flask==2.3.2
gunicorn==21.2.0
EOF

# Create configuration file
cat > config.py << 'EOF'
import os

class Config:
    SECRET_KEY = os.environ.get('SECRET_KEY') or 'dev-secret-key'
    DEBUG = os.environ.get('FLASK_DEBUG', 'False').lower() == 'true'
    PORT = int(os.environ.get('PORT', 5000))
EOF

# Create startup script
cat > docker-entrypoint.sh << 'EOF'
#!/bin/sh
set -e

echo "Starting Flask application..."
echo "Environment: ${APP_ENV:-development}"
echo "Version: ${APP_VERSION:-unknown}"

# Database migrations and other initialization can be done here
# python -c "from app import db; db.create_all()"

exec "$@"
EOF

chmod +x docker-entrypoint.sh

Create optimized Dockerfile:

# Multi-stage build complete Flask application
ARG PYTHON_VERSION=3.9
FROM python:${PYTHON_VERSION}-slim as base

# Build arguments
ARG BUILD_DATE
ARG VCS_REF
ARG VERSION=dev

# Label information
LABEL maintainer="developer@example.com" \
      build_date="${BUILD_DATE}" \
      vcs_ref="${VCS_REF}" \
      version="${VERSION}" \
      description="Flask web application with Docker"

# Python environment configuration
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_DEFAULT_TIMEOUT=100

# Dependency installation stage
FROM base AS dependencies

# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        gcc \
        libc6-dev \
        curl && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

# Create application user
RUN groupadd -r appuser && \
    useradd -r -g appuser -d /app -s /bin/bash appuser

# Set working directory
WORKDIR /app

# Install Python dependencies
COPY requirements.txt .
RUN pip install --user --no-warn-script-location -r requirements.txt

# Production image
FROM base AS production

# Copy user configuration
RUN groupadd -r appuser && \
    useradd -r -g appuser -d /app -s /bin/bash appuser

# Install runtime dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        curl \
        ca-certificates && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

WORKDIR /app

# Copy Python dependencies
COPY --from=dependencies --chown=appuser:appuser /root/.local /home/appuser/.local

# Copy application code
COPY --chown=appuser:appuser . .

# Ensure script is executable
RUN chmod +x docker-entrypoint.sh

# Switch to application user
USER appuser

# Set PATH to include user-installed packages
ENV PATH=/home/appuser/.local/bin:$PATH

# Set application environment variables
ENV APP_VERSION=${VERSION} \
    APP_ENV=production \
    PORT=5000

# Expose port
EXPOSE 5000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:5000/health || exit 1

# Startup configuration
ENTRYPOINT ["./docker-entrypoint.sh"]
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "--timeout", "120", "app:app"]

Create .dockerignore file:

.git
.gitignore
README.md
Dockerfile
.dockerignore
__pycache__
*.pyc
*.pyo
*.pyd
.Python
.pytest_cache
.coverage
.venv
venv/
.env
.env.local
.env.development.local
.env.test.local
.env.production.local

Build and test:

# Build image
docker build \
  --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
  --build-arg VCS_REF=$(git rev-parse --short HEAD) \
  --build-arg VERSION=1.0.0 \
  -t flask-app:1.0.0 \
  -t flask-app:latest .

# View image information
docker images flask-app
docker inspect flask-app:latest | jq '.[0].Config.Labels'

# Run container
docker run -d --name flask-app \
  -p 5000:5000 \
  -e APP_ENV=production \
  flask-app:latest

# Test application
curl http://localhost:5000/
curl http://localhost:5000/health
curl http://localhost:5000/info

# View logs
docker logs -f flask-app

# Cleanup
docker stop flask-app
docker rm flask-app
Dockerfile Best Practices
  1. Single Responsibility: Each container should run only one service
  2. Minimize Layers: Combine related RUN instructions to reduce layers
  3. Utilize Caching: Place less frequently changing instructions earlier
  4. Security Configuration: Use non-root users to run applications
  5. Health Checks: Add health checks to ensure service availability
  6. Tag Management: Add appropriate tags and metadata to images
Important Notes
  • Avoid including sensitive information (passwords, keys, etc.) in images
  • Use multi-stage builds to reduce final image size
  • Regularly update base images for security patches
  • Use .dockerignore to exclude unnecessary files
  • Pin base image versions in production environments

Summary

Through this chapter, you should have mastered:

  • Dockerfile Syntax: Proficiency in using various Dockerfile instructions
  • Build Techniques: Mastering multi-stage builds and image optimization methods
  • Best Practices: Understanding security, performance, and maintainability best practices
  • Practical Skills: Ability to write high-quality Dockerfiles for different types of applications

In the next chapter, we’ll learn about Docker volumes and network configuration, addressing container data persistence and network communication issues.