Chapter 10: Production Environment Best Practices

Haiyue
24min

Chapter 10: Production Environment Best Practices

Learning Objectives
  • Master production-grade Kubernetes cluster planning
  • Learn high availability architecture design and disaster recovery solutions
  • Understand resource management and cost optimization strategies
  • Become proficient in cluster upgrades and failure recovery

Key Concepts

Production Environment Key Elements

🔄 正在渲染 Mermaid 图表...

Production Cluster Architecture

🔄 正在渲染 Mermaid 图表...

Cluster Planning

Node Planning

RoleQuantityConfiguration RecommendationNotes
Master3/54C8G+Odd number for etcd election
Worker3+As neededAt least 3 for Pod distribution
Infra2-34C8G+Run monitoring, logging, etc.
EdgeAs needed2C4G+Run Ingress Controller

Node Labels and Taints

# Label nodes
kubectl label nodes node-1 node-type=worker
kubectl label nodes node-2 node-type=worker
kubectl label nodes node-3 node-type=infra
kubectl label nodes node-4 topology.kubernetes.io/zone=zone-a

# Add taints (dedicated nodes)
kubectl taint nodes node-3 dedicated=infra:NoSchedule
kubectl taint nodes node-4 dedicated=gpu:NoSchedule

# Remove taints
kubectl taint nodes node-3 dedicated=infra:NoSchedule-
# Use node selectors and tolerations
apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-stack
spec:
  template:
    spec:
      nodeSelector:
        node-type: infra
      tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "infra"
        effect: "NoSchedule"
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: monitoring
            topologyKey: kubernetes.io/hostname

Resource Quotas

# Namespace resource quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    pods: "500"
    services: "50"
    secrets: "100"
    configmaps: "100"
    persistentvolumeclaims: "50"
    requests.storage: 500Gi

---
# Limit ranges
apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
  # Container default limits
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container
  # Container maximum limits
  - max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    type: Container
  # Pod maximum limits
  - max:
      cpu: "16"
      memory: "32Gi"
    type: Pod
  # PVC limits
  - max:
      storage: "100Gi"
    min:
      storage: "1Gi"
    type: PersistentVolumeClaim

High Availability Deployment

Pod Distribution Strategy

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  template:
    spec:
      affinity:
        # Pod anti-affinity: Spread across different nodes
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: web-app
              topologyKey: kubernetes.io/hostname
          # Spread across different availability zones
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: web-app
            topologyKey: topology.kubernetes.io/zone
            maxSkew: 1
            whenUnsatisfiable: DoNotSchedule
      # Topology spread constraints
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: web-app
      - maxSkew: 2
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: web-app

Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  # Minimum number of available Pods
  minAvailable: 3
  # Or maximum number of unavailable Pods
  # maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app

Priority and Preemption

# Define priority class
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "High priority for critical business applications"
preemptionPolicy: PreemptLowerPriority

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
globalDefault: false
description: "Low priority for batch processing tasks"
preemptionPolicy: Never

---
# Use priority
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-app
spec:
  template:
    spec:
      priorityClassName: high-priority
      containers:
      - name: app
        image: critical-app:v1

Auto Scaling

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 50
  metrics:
  # CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  # Memory utilization
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  # Custom metric (requires Prometheus Adapter)
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  # External metric
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: myqueue
      target:
        type: AverageValue
        averageValue: "30"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Scale down stabilization window
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 4
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

Vertical Pod Autoscaler (VPA)

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-rbac.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-deployment.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

Cluster Autoscaler

# cluster-autoscaler configuration (cloud provider specific)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.26.0
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=kube-system
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5

Backup and Recovery

etcd Backup

# Manual etcd backup
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify backup
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db --write-out=table

# Restore etcd
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --name=etcd-0 \
  --initial-cluster=etcd-0=https://etcd-0:2380 \
  --initial-cluster-token=etcd-cluster-1 \
  --initial-advertise-peer-urls=https://etcd-0:2380 \
  --data-dir=/var/lib/etcd-restored
# Scheduled backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: "0 */6 * * *"  # Every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: bitnami/etcd:latest
            command:
            - /bin/sh
            - -c
            - |
              ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
                --endpoints=$ETCD_ENDPOINTS \
                --cacert=/etc/etcd/ca.crt \
                --cert=/etc/etcd/client.crt \
                --key=/etc/etcd/client.key
              # Clean up backups older than 7 days
              find /backup -name "etcd-*.db" -mtime +7 -delete
            env:
            - name: ETCD_ENDPOINTS
              value: "https://etcd-0:2379,https://etcd-1:2379,https://etcd-2:2379"
            volumeMounts:
            - name: backup
              mountPath: /backup
            - name: etcd-certs
              mountPath: /etc/etcd
          restartPolicy: OnFailure
          volumes:
          - name: backup
            persistentVolumeClaim:
              claimName: etcd-backup-pvc
          - name: etcd-certs
            secret:
              secretName: etcd-client-certs

Velero Backup

# Install Velero
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.7.0 \
  --bucket velero-backup \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1 \
  --secret-file ./credentials-velero

# Create backup
velero backup create production-backup \
  --include-namespaces production \
  --include-resources deployments,services,configmaps,secrets,pvc

# Scheduled backup
velero schedule create daily-backup \
  --schedule="0 2 * * *" \
  --include-namespaces production \
  --ttl 168h

# View backups
velero backup get
velero backup describe production-backup

# Restore
velero restore create --from-backup production-backup

# Restore to different namespace
velero restore create --from-backup production-backup \
  --namespace-mappings production:production-restored

Application Backup Strategy

# Use Velero annotations to configure backup
apiVersion: apps/v1
kind: Deployment
metadata:
  name: database
  annotations:
    # Command to execute before backup
    pre.hook.backup.velero.io/container: database
    pre.hook.backup.velero.io/command: '["/bin/sh", "-c", "pg_dump -U postgres mydb > /backup/dump.sql"]'
    # Command to execute after restore
    post.hook.restore.velero.io/container: database
    post.hook.restore.velero.io/command: '["/bin/sh", "-c", "psql -U postgres mydb < /backup/dump.sql"]'

Cluster Upgrades

Upgrade Strategy

🔄 正在渲染 Mermaid 图表...

kubeadm Upgrade

# Upgrade control plane (first Master)
# 1. Check available versions
apt update
apt-cache madison kubeadm

# 2. Upgrade kubeadm
apt-get update && apt-get install -y kubeadm=1.28.0-00

# 3. Verify upgrade plan
kubeadm upgrade plan

# 4. Execute upgrade
kubeadm upgrade apply v1.28.0

# 5. Upgrade kubelet and kubectl
apt-get update && apt-get install -y kubelet=1.28.0-00 kubectl=1.28.0-00
systemctl daemon-reload
systemctl restart kubelet

# Upgrade other Master nodes
kubeadm upgrade node

# Upgrade Worker nodes
# 1. Drain Pods
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# 2. Upgrade kubeadm, kubelet, kubectl
apt-get update && apt-get install -y \
  kubeadm=1.28.0-00 \
  kubelet=1.28.0-00 \
  kubectl=1.28.0-00

# 3. Upgrade node configuration
kubeadm upgrade node

# 4. Restart kubelet
systemctl daemon-reload
systemctl restart kubelet

# 5. Resume scheduling
kubectl uncordon node-1

Upgrade Checklist

# Pre-upgrade checks
# 1. Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /backup/pre-upgrade-snapshot.db

# 2. Backup important resources
kubectl get all --all-namespaces -o yaml > all-resources-backup.yaml

# 3. Check cluster health
kubectl get nodes
kubectl get pods --all-namespaces | grep -v Running

# 4. Check API deprecations
kubectl deprecations  # Requires pluto or kubent tool

# 5. Check PDB
kubectl get pdb --all-namespaces

# 6. Check resource quotas
kubectl describe resourcequota --all-namespaces

# Post-upgrade verification
# 1. Check node status
kubectl get nodes -o wide

# 2. Check core components
kubectl get pods -n kube-system

# 3. Check application status
kubectl get pods --all-namespaces | grep -v Running

# 4. Run end-to-end tests
kubectl run test --image=busybox --rm -it --restart=Never -- wget -qO- http://kubernetes

Failure Recovery

Common Failure Handling

# Node NotReady
kubectl describe node <node-name>
# Check kubelet logs
journalctl -u kubelet -f

# Pod CrashLoopBackOff
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>

# Pod Pending
kubectl describe pod <pod-name>
# Check events
kubectl get events --sort-by='.lastTimestamp'

# Resource exhaustion
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory

# etcd issues
ETCDCTL_API=3 etcdctl endpoint health
ETCDCTL_API=3 etcdctl endpoint status --write-out=table

Node Failure Recovery

# Handling when node cannot be recovered

# 1. Mark node as unschedulable
kubectl cordon <node-name>

# 2. Drain Pods from node
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force

# 3. Delete node
kubectl delete node <node-name>

# 4. Handle PV/PVC
# Check if Pods cannot start due to node failure
kubectl get pods --all-namespaces -o wide | grep <node-name>

# Force delete Terminating Pods
kubectl delete pod <pod-name> --grace-period=0 --force

# 5. For local storage, manually handle PV
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'

Disaster Recovery

# Complete cluster recovery steps

# 1. Prepare new infrastructure
# Deploy new nodes using Terraform/Ansible

# 2. Initialize first Master
kubeadm init --config kubeadm-config.yaml

# 3. Restore etcd data
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd

# 4. Join other Master nodes
kubeadm join --config kubeadm-join-config.yaml

# 5. Join Worker nodes
kubeadm join --token <token> <master-ip>:6443

# 6. Restore applications
velero restore create --from-backup <backup-name>

# 7. Verify recovery
kubectl get all --all-namespaces

Cost Optimization

Resource Optimization

# Set reasonable resource requests and limits
apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-app
spec:
  template:
    spec:
      containers:
      - name: app
        resources:
          requests:
            cpu: 100m       # Set based on actual usage
            memory: 128Mi
          limits:
            cpu: 500m       # Limit burst usage
            memory: 512Mi
# View resource usage
kubectl top pods --all-namespaces | sort -k3 -n -r | head -20

# Use VPA recommendations
kubectl get vpa -o yaml | grep -A 10 recommendation

# Identify unused resources
kubectl get pvc --all-namespaces | grep -v Bound
kubectl get secrets --all-namespaces --field-selector type=kubernetes.io/service-account-token

Spot/Preemptible Instances

# Use preemptible instances for fault-tolerant tasks
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/instance-type: spot
      tolerations:
      - key: "kubernetes.io/spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      # Handle preemption gracefully
      terminationGracePeriodSeconds: 120
      containers:
      - name: processor
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                # Save checkpoint
                checkpoint-save.sh
                # Wait for ongoing tasks to complete
                sleep 60

Resource Analysis Tools

# Install kubecost for cost analysis
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace

# Use kubectl-cost plugin
kubectl cost namespace --show-all-resources
kubectl cost deployment -n production

Operations Checklist

Daily Checks

#!/bin/bash
# daily-check.sh

echo "=== Cluster Health Check ==="

# Node status
echo "--- Node Status ---"
kubectl get nodes -o wide

# Core component status
echo "--- Core Components ---"
kubectl get pods -n kube-system

# Abnormal Pods
echo "--- Abnormal Pods ---"
kubectl get pods --all-namespaces --field-selector status.phase!=Running,status.phase!=Succeeded

# Resource usage
echo "--- Resource Usage ---"
kubectl top nodes
kubectl top pods --all-namespaces | sort -k3 -n -r | head -10

# Certificate expiration check
echo "--- Certificate Status ---"
kubeadm certs check-expiration

# PVC status
echo "--- Storage Status ---"
kubectl get pvc --all-namespaces | grep -v Bound

# Recent events
echo "--- Recent Alert Events ---"
kubectl get events --all-namespaces --sort-by='.lastTimestamp' | grep -i warning | tail -20

Monitoring Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: production-alerts
  namespace: monitoring
spec:
  groups:
  - name: cluster-health
    rules:
    # Node memory shortage
    - alert: NodeMemoryPressure
      expr: |
        (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.9
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.instance }} memory shortage"

    # Node disk shortage
    - alert: NodeDiskPressure
      expr: |
        (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) > 0.85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Node {{ $labels.instance }} disk space shortage"

    # API Server latency
    - alert: APIServerHighLatency
      expr: |
        histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{verb!="WATCH"}[5m])) by (le)) > 1
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "API Server response latency too high"

    # etcd latency
    - alert: EtcdHighLatency
      expr: |
        histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (le)) > 0.25
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "etcd commit latency too high"

    # Certificate expiring soon
    - alert: CertificateExpiringSoon
      expr: |
        (apiserver_client_certificate_expiration_seconds_count > 0 and
         apiserver_client_certificate_expiration_seconds_bucket{le="604800"} > 0)
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: "Client certificate will expire within 7 days"
Production Environment Best Practices Summary
  1. High availability: Multi-Master, cross-AZ, PDB configuration
  2. Resource management: Reasonable quotas, resource limits, auto scaling
  3. Security: RBAC, network policies, image scanning
  4. Observability: Monitoring alerts, log collection, distributed tracing
  5. Operations: Regular backups, upgrade plans, failure drills
  6. Cost: Resource optimization, Spot instances, regular cleanup

Summary

Through this chapter, you should have mastered:

  • Cluster planning: Node planning, labels and taints, resource quotas
  • High availability: Pod distribution, PDB, priority preemption
  • Auto scaling: HPA, VPA, Cluster Autoscaler
  • Backup recovery: etcd backup, Velero usage, disaster recovery
  • Cluster upgrades: Upgrade strategies, kubeadm upgrade steps
  • Cost optimization: Resource optimization, Spot instance usage

Congratulations on completing the entire Kubernetes container orchestration course! From basic introduction to production best practices, you have mastered Kubernetes core knowledge and practical skills. Next, it is recommended to continue practicing in real projects and stay updated with the latest developments in the cloud-native community.