Chapter 10: Production Environment Best Practices
Haiyue
24min
Chapter 10: Production Environment Best Practices
Learning Objectives
- Master production-grade Kubernetes cluster planning
- Learn high availability architecture design and disaster recovery solutions
- Understand resource management and cost optimization strategies
- Become proficient in cluster upgrades and failure recovery
Key Concepts
Production Environment Key Elements
🔄 正在渲染 Mermaid 图表...
Production Cluster Architecture
🔄 正在渲染 Mermaid 图表...
Cluster Planning
Node Planning
| Role | Quantity | Configuration Recommendation | Notes |
|---|---|---|---|
| Master | 3/5 | 4C8G+ | Odd number for etcd election |
| Worker | 3+ | As needed | At least 3 for Pod distribution |
| Infra | 2-3 | 4C8G+ | Run monitoring, logging, etc. |
| Edge | As needed | 2C4G+ | Run Ingress Controller |
Node Labels and Taints
# Label nodes
kubectl label nodes node-1 node-type=worker
kubectl label nodes node-2 node-type=worker
kubectl label nodes node-3 node-type=infra
kubectl label nodes node-4 topology.kubernetes.io/zone=zone-a
# Add taints (dedicated nodes)
kubectl taint nodes node-3 dedicated=infra:NoSchedule
kubectl taint nodes node-4 dedicated=gpu:NoSchedule
# Remove taints
kubectl taint nodes node-3 dedicated=infra:NoSchedule-
# Use node selectors and tolerations
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-stack
spec:
template:
spec:
nodeSelector:
node-type: infra
tolerations:
- key: "dedicated"
operator: "Equal"
value: "infra"
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: monitoring
topologyKey: kubernetes.io/hostname
Resource Quotas
# Namespace resource quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
limits.cpu: "200"
limits.memory: 400Gi
pods: "500"
services: "50"
secrets: "100"
configmaps: "100"
persistentvolumeclaims: "50"
requests.storage: 500Gi
---
# Limit ranges
apiVersion: v1
kind: LimitRange
metadata:
name: production-limits
namespace: production
spec:
limits:
# Container default limits
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
# Container maximum limits
- max:
cpu: "4"
memory: "8Gi"
min:
cpu: "50m"
memory: "64Mi"
type: Container
# Pod maximum limits
- max:
cpu: "16"
memory: "32Gi"
type: Pod
# PVC limits
- max:
storage: "100Gi"
min:
storage: "1Gi"
type: PersistentVolumeClaim
High Availability Deployment
Pod Distribution Strategy
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 6
template:
spec:
affinity:
# Pod anti-affinity: Spread across different nodes
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: web-app
topologyKey: kubernetes.io/hostname
# Spread across different availability zones
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: web-app
topologyKey: topology.kubernetes.io/zone
maxSkew: 1
whenUnsatisfiable: DoNotSchedule
# Topology spread constraints
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web-app
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: web-app
Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
spec:
# Minimum number of available Pods
minAvailable: 3
# Or maximum number of unavailable Pods
# maxUnavailable: 1
selector:
matchLabels:
app: web-app
Priority and Preemption
# Define priority class
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "High priority for critical business applications"
preemptionPolicy: PreemptLowerPriority
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 100
globalDefault: false
description: "Low priority for batch processing tasks"
preemptionPolicy: Never
---
# Use priority
apiVersion: apps/v1
kind: Deployment
metadata:
name: critical-app
spec:
template:
spec:
priorityClassName: high-priority
containers:
- name: app
image: critical-app:v1
Auto Scaling
Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 50
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Custom metric (requires Prometheus Adapter)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
# External metric
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: myqueue
target:
type: AverageValue
averageValue: "30"
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Scale down stabilization window
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 4
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Vertical Pod Autoscaler (VPA)
# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-rbac.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-deployment.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto" # Off, Initial, Recreate, Auto
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
Cluster Autoscaler
# cluster-autoscaler configuration (cloud provider specific)
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.26.0
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --namespace=kube-system
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
- --scale-down-enabled=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --scale-down-utilization-threshold=0.5
Backup and Recovery
etcd Backup
# Manual etcd backup
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Verify backup
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db --write-out=table
# Restore etcd
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--name=etcd-0 \
--initial-cluster=etcd-0=https://etcd-0:2380 \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://etcd-0:2380 \
--data-dir=/var/lib/etcd-restored
# Scheduled backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: bitnami/etcd:latest
command:
- /bin/sh
- -c
- |
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
--endpoints=$ETCD_ENDPOINTS \
--cacert=/etc/etcd/ca.crt \
--cert=/etc/etcd/client.crt \
--key=/etc/etcd/client.key
# Clean up backups older than 7 days
find /backup -name "etcd-*.db" -mtime +7 -delete
env:
- name: ETCD_ENDPOINTS
value: "https://etcd-0:2379,https://etcd-1:2379,https://etcd-2:2379"
volumeMounts:
- name: backup
mountPath: /backup
- name: etcd-certs
mountPath: /etc/etcd
restartPolicy: OnFailure
volumes:
- name: backup
persistentVolumeClaim:
claimName: etcd-backup-pvc
- name: etcd-certs
secret:
secretName: etcd-client-certs
Velero Backup
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.7.0 \
--bucket velero-backup \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1 \
--secret-file ./credentials-velero
# Create backup
velero backup create production-backup \
--include-namespaces production \
--include-resources deployments,services,configmaps,secrets,pvc
# Scheduled backup
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production \
--ttl 168h
# View backups
velero backup get
velero backup describe production-backup
# Restore
velero restore create --from-backup production-backup
# Restore to different namespace
velero restore create --from-backup production-backup \
--namespace-mappings production:production-restored
Application Backup Strategy
# Use Velero annotations to configure backup
apiVersion: apps/v1
kind: Deployment
metadata:
name: database
annotations:
# Command to execute before backup
pre.hook.backup.velero.io/container: database
pre.hook.backup.velero.io/command: '["/bin/sh", "-c", "pg_dump -U postgres mydb > /backup/dump.sql"]'
# Command to execute after restore
post.hook.restore.velero.io/container: database
post.hook.restore.velero.io/command: '["/bin/sh", "-c", "psql -U postgres mydb < /backup/dump.sql"]'
Cluster Upgrades
Upgrade Strategy
🔄 正在渲染 Mermaid 图表...
kubeadm Upgrade
# Upgrade control plane (first Master)
# 1. Check available versions
apt update
apt-cache madison kubeadm
# 2. Upgrade kubeadm
apt-get update && apt-get install -y kubeadm=1.28.0-00
# 3. Verify upgrade plan
kubeadm upgrade plan
# 4. Execute upgrade
kubeadm upgrade apply v1.28.0
# 5. Upgrade kubelet and kubectl
apt-get update && apt-get install -y kubelet=1.28.0-00 kubectl=1.28.0-00
systemctl daemon-reload
systemctl restart kubelet
# Upgrade other Master nodes
kubeadm upgrade node
# Upgrade Worker nodes
# 1. Drain Pods
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# 2. Upgrade kubeadm, kubelet, kubectl
apt-get update && apt-get install -y \
kubeadm=1.28.0-00 \
kubelet=1.28.0-00 \
kubectl=1.28.0-00
# 3. Upgrade node configuration
kubeadm upgrade node
# 4. Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
# 5. Resume scheduling
kubectl uncordon node-1
Upgrade Checklist
# Pre-upgrade checks
# 1. Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /backup/pre-upgrade-snapshot.db
# 2. Backup important resources
kubectl get all --all-namespaces -o yaml > all-resources-backup.yaml
# 3. Check cluster health
kubectl get nodes
kubectl get pods --all-namespaces | grep -v Running
# 4. Check API deprecations
kubectl deprecations # Requires pluto or kubent tool
# 5. Check PDB
kubectl get pdb --all-namespaces
# 6. Check resource quotas
kubectl describe resourcequota --all-namespaces
# Post-upgrade verification
# 1. Check node status
kubectl get nodes -o wide
# 2. Check core components
kubectl get pods -n kube-system
# 3. Check application status
kubectl get pods --all-namespaces | grep -v Running
# 4. Run end-to-end tests
kubectl run test --image=busybox --rm -it --restart=Never -- wget -qO- http://kubernetes
Failure Recovery
Common Failure Handling
# Node NotReady
kubectl describe node <node-name>
# Check kubelet logs
journalctl -u kubelet -f
# Pod CrashLoopBackOff
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>
# Pod Pending
kubectl describe pod <pod-name>
# Check events
kubectl get events --sort-by='.lastTimestamp'
# Resource exhaustion
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory
# etcd issues
ETCDCTL_API=3 etcdctl endpoint health
ETCDCTL_API=3 etcdctl endpoint status --write-out=table
Node Failure Recovery
# Handling when node cannot be recovered
# 1. Mark node as unschedulable
kubectl cordon <node-name>
# 2. Drain Pods from node
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
# 3. Delete node
kubectl delete node <node-name>
# 4. Handle PV/PVC
# Check if Pods cannot start due to node failure
kubectl get pods --all-namespaces -o wide | grep <node-name>
# Force delete Terminating Pods
kubectl delete pod <pod-name> --grace-period=0 --force
# 5. For local storage, manually handle PV
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'
Disaster Recovery
# Complete cluster recovery steps
# 1. Prepare new infrastructure
# Deploy new nodes using Terraform/Ansible
# 2. Initialize first Master
kubeadm init --config kubeadm-config.yaml
# 3. Restore etcd data
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd
# 4. Join other Master nodes
kubeadm join --config kubeadm-join-config.yaml
# 5. Join Worker nodes
kubeadm join --token <token> <master-ip>:6443
# 6. Restore applications
velero restore create --from-backup <backup-name>
# 7. Verify recovery
kubectl get all --all-namespaces
Cost Optimization
Resource Optimization
# Set reasonable resource requests and limits
apiVersion: apps/v1
kind: Deployment
metadata:
name: optimized-app
spec:
template:
spec:
containers:
- name: app
resources:
requests:
cpu: 100m # Set based on actual usage
memory: 128Mi
limits:
cpu: 500m # Limit burst usage
memory: 512Mi
# View resource usage
kubectl top pods --all-namespaces | sort -k3 -n -r | head -20
# Use VPA recommendations
kubectl get vpa -o yaml | grep -A 10 recommendation
# Identify unused resources
kubectl get pvc --all-namespaces | grep -v Bound
kubectl get secrets --all-namespaces --field-selector type=kubernetes.io/service-account-token
Spot/Preemptible Instances
# Use preemptible instances for fault-tolerant tasks
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
template:
spec:
nodeSelector:
node.kubernetes.io/instance-type: spot
tolerations:
- key: "kubernetes.io/spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# Handle preemption gracefully
terminationGracePeriodSeconds: 120
containers:
- name: processor
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# Save checkpoint
checkpoint-save.sh
# Wait for ongoing tasks to complete
sleep 60
Resource Analysis Tools
# Install kubecost for cost analysis
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace
# Use kubectl-cost plugin
kubectl cost namespace --show-all-resources
kubectl cost deployment -n production
Operations Checklist
Daily Checks
#!/bin/bash
# daily-check.sh
echo "=== Cluster Health Check ==="
# Node status
echo "--- Node Status ---"
kubectl get nodes -o wide
# Core component status
echo "--- Core Components ---"
kubectl get pods -n kube-system
# Abnormal Pods
echo "--- Abnormal Pods ---"
kubectl get pods --all-namespaces --field-selector status.phase!=Running,status.phase!=Succeeded
# Resource usage
echo "--- Resource Usage ---"
kubectl top nodes
kubectl top pods --all-namespaces | sort -k3 -n -r | head -10
# Certificate expiration check
echo "--- Certificate Status ---"
kubeadm certs check-expiration
# PVC status
echo "--- Storage Status ---"
kubectl get pvc --all-namespaces | grep -v Bound
# Recent events
echo "--- Recent Alert Events ---"
kubectl get events --all-namespaces --sort-by='.lastTimestamp' | grep -i warning | tail -20
Monitoring Alert Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: production-alerts
namespace: monitoring
spec:
groups:
- name: cluster-health
rules:
# Node memory shortage
- alert: NodeMemoryPressure
expr: |
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} memory shortage"
# Node disk shortage
- alert: NodeDiskPressure
expr: |
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Node {{ $labels.instance }} disk space shortage"
# API Server latency
- alert: APIServerHighLatency
expr: |
histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{verb!="WATCH"}[5m])) by (le)) > 1
for: 10m
labels:
severity: warning
annotations:
summary: "API Server response latency too high"
# etcd latency
- alert: EtcdHighLatency
expr: |
histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (le)) > 0.25
for: 10m
labels:
severity: warning
annotations:
summary: "etcd commit latency too high"
# Certificate expiring soon
- alert: CertificateExpiringSoon
expr: |
(apiserver_client_certificate_expiration_seconds_count > 0 and
apiserver_client_certificate_expiration_seconds_bucket{le="604800"} > 0)
for: 0m
labels:
severity: critical
annotations:
summary: "Client certificate will expire within 7 days"
Production Environment Best Practices Summary
- High availability: Multi-Master, cross-AZ, PDB configuration
- Resource management: Reasonable quotas, resource limits, auto scaling
- Security: RBAC, network policies, image scanning
- Observability: Monitoring alerts, log collection, distributed tracing
- Operations: Regular backups, upgrade plans, failure drills
- Cost: Resource optimization, Spot instances, regular cleanup
Summary
Through this chapter, you should have mastered:
- Cluster planning: Node planning, labels and taints, resource quotas
- High availability: Pod distribution, PDB, priority preemption
- Auto scaling: HPA, VPA, Cluster Autoscaler
- Backup recovery: etcd backup, Velero usage, disaster recovery
- Cluster upgrades: Upgrade strategies, kubeadm upgrade steps
- Cost optimization: Resource optimization, Spot instance usage
Congratulations on completing the entire Kubernetes container orchestration course! From basic introduction to production best practices, you have mastered Kubernetes core knowledge and practical skills. Next, it is recommended to continue practicing in real projects and stay updated with the latest developments in the cloud-native community.