Chapter 6: Argo Rollouts Progressive Delivery
Deep dive into Argo Rollouts canary deployments, blue-green deployments, A/B testing and other progressive delivery strategies
作者
36min
Argo Rollouts Progressive Delivery
Chapter 6: Implementing Safe Application Releases with Argo Rollouts
Argo Rollouts is a Kubernetes controller that provides advanced deployment capabilities such as blue-green deployments, canary deployments, canary analysis, experimentation, and progressive delivery features.
6.1 Argo Rollouts Overview
6.1.1 Why Progressive Delivery is Needed
🔄 正在渲染 Mermaid 图表...
6.1.2 Argo Rollouts Core Concepts
🔄 正在渲染 Mermaid 图表...
6.1.3 Installing Argo Rollouts
# Install Argo Rollouts
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
# Install kubectl plugin (optional but recommended)
# macOS
brew install argoproj/tap/kubectl-argo-rollouts
# Linux
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts
# Verify installation
kubectl argo rollouts version
6.2 Blue-Green Deployment
6.2.1 Blue-Green Deployment Principles
🔄 正在渲染 Mermaid 图表...
6.2.2 Blue-Green Deployment Configuration
# blue-green-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: blue-green-demo
spec:
replicas: 3
revisionHistoryLimit: 2
selector:
matchLabels:
app: blue-green-demo
template:
metadata:
labels:
app: blue-green-demo
spec:
containers:
- name: app
image: nginx:1.24
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
strategy:
blueGreen:
# Active service (production traffic)
activeService: blue-green-active
# Preview service (for testing new version)
previewService: blue-green-preview
# Auto promotion (set to false for manual confirmation)
autoPromotionEnabled: false
# Time to keep old version after switch
scaleDownDelaySeconds: 30
# Preview replica count after ready
previewReplicaCount: 1
---
# Active service
apiVersion: v1
kind: Service
metadata:
name: blue-green-active
spec:
selector:
app: blue-green-demo
ports:
- port: 80
targetPort: 80
---
# Preview service
apiVersion: v1
kind: Service
metadata:
name: blue-green-preview
spec:
selector:
app: blue-green-demo
ports:
- port: 80
targetPort: 80
6.2.3 Blue-Green Deployment Operations
# Deploy application
kubectl apply -f blue-green-rollout.yaml
# View Rollout status
kubectl argo rollouts get rollout blue-green-demo
# Update image to trigger deployment
kubectl argo rollouts set image blue-green-demo app=nginx:1.25
# Watch deployment progress (real-time monitoring)
kubectl argo rollouts get rollout blue-green-demo --watch
# Manual promotion
kubectl argo rollouts promote blue-green-demo
# Rollback to previous version
kubectl argo rollouts undo blue-green-demo
# Abort deployment
kubectl argo rollouts abort blue-green-demo
6.2.4 Advanced Blue-Green Configuration
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: blue-green-advanced
spec:
replicas: 5
selector:
matchLabels:
app: blue-green-advanced
template:
metadata:
labels:
app: blue-green-advanced
spec:
containers:
- name: app
image: myapp:v1
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
strategy:
blueGreen:
activeService: bg-active-svc
previewService: bg-preview-svc
# Auto promotion configuration
autoPromotionEnabled: true
autoPromotionSeconds: 60 # Auto promote after 60 seconds
# Preview replica count
previewReplicaCount: 2
# Scale down delay
scaleDownDelaySeconds: 60
# Scale down revision limit
scaleDownDelayRevisionLimit: 2
# Anti-affinity configuration
antiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: {}
# Pre-promotion analysis
prePromotionAnalysis:
templates:
- templateName: success-rate-check
args:
- name: service-name
value: bg-preview-svc
# Post-promotion analysis
postPromotionAnalysis:
templates:
- templateName: error-rate-check
args:
- name: service-name
value: bg-active-svc
6.3 Canary Deployment
6.3.1 Canary Deployment Principles
🔄 正在渲染 Mermaid 图表...
6.3.2 Basic Canary Configuration
# canary-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-demo
spec:
replicas: 10
selector:
matchLabels:
app: canary-demo
template:
metadata:
labels:
app: canary-demo
spec:
containers:
- name: app
image: nginx:1.24
ports:
- containerPort: 80
strategy:
canary:
# Canary service (optional, for direct access to canary version)
canaryService: canary-demo-canary
# Stable service
stableService: canary-demo-stable
# Deployment steps
steps:
# Step 1: 5% traffic
- setWeight: 5
# Pause for manual confirmation or timeout
- pause: {duration: 1m}
# Step 2: 20% traffic
- setWeight: 20
- pause: {duration: 2m}
# Step 3: 50% traffic
- setWeight: 50
- pause: {duration: 5m}
# Step 4: 80% traffic
- setWeight: 80
- pause: {duration: 5m}
# Auto switch to 100% after completion
---
apiVersion: v1
kind: Service
metadata:
name: canary-demo-stable
spec:
selector:
app: canary-demo
ports:
- port: 80
---
apiVersion: v1
kind: Service
metadata:
name: canary-demo-canary
spec:
selector:
app: canary-demo
ports:
- port: 80
6.3.3 Canary Deployment with Traffic Management
# canary-with-traffic-management.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-nginx
spec:
replicas: 5
selector:
matchLabels:
app: canary-nginx
template:
metadata:
labels:
app: canary-nginx
spec:
containers:
- name: nginx
image: nginx:1.24
ports:
- containerPort: 80
strategy:
canary:
canaryService: canary-nginx-canary
stableService: canary-nginx-stable
# Nginx Ingress traffic management
trafficRouting:
nginx:
stableIngress: canary-nginx-ingress
annotationPrefix: nginx.ingress.kubernetes.io
additionalIngressAnnotations:
canary-by-header: X-Canary
canary-by-header-value: "true"
steps:
- setWeight: 5
- pause: {duration: 30s}
- setWeight: 20
- pause: {duration: 1m}
- setWeight: 50
- pause: {duration: 2m}
- setWeight: 80
- pause: {duration: 2m}
---
# Stable Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: canary-nginx-ingress
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: canary-demo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: canary-nginx-stable
port:
number: 80
6.3.4 Istio Traffic Management Integration
# canary-with-istio.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-istio-demo
spec:
replicas: 5
selector:
matchLabels:
app: canary-istio-demo
template:
metadata:
labels:
app: canary-istio-demo
spec:
containers:
- name: app
image: myapp:v1
ports:
- containerPort: 8080
strategy:
canary:
canaryService: canary-istio-demo-canary
stableService: canary-istio-demo-stable
trafficRouting:
istio:
virtualService:
name: canary-istio-demo-vsvc
routes:
- primary
destinationRule:
name: canary-istio-demo-destrule
canarySubsetName: canary
stableSubsetName: stable
steps:
- setWeight: 10
- pause: {duration: 1m}
- setWeight: 30
- pause: {duration: 2m}
- setWeight: 60
- pause: {duration: 3m}
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: canary-istio-demo-vsvc
spec:
hosts:
- canary-istio-demo.example.com
http:
- name: primary
route:
- destination:
host: canary-istio-demo-stable
weight: 100
- destination:
host: canary-istio-demo-canary
weight: 0
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: canary-istio-demo-destrule
spec:
host: canary-istio-demo
subsets:
- name: stable
labels:
app: canary-istio-demo
- name: canary
labels:
app: canary-istio-demo
6.4 Analysis and Automated Rollback
6.4.1 AnalysisTemplate Concept
🔄 正在渲染 Mermaid 图表...
6.4.2 Prometheus Analysis Template
# analysis-template-prometheus.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate-prometheus
spec:
args:
- name: service-name
- name: namespace
value: default
metrics:
- name: success-rate
# Check every 30 seconds
interval: 30s
# At least 5 measurements needed
count: 5
# Success threshold
successCondition: result[0] >= 0.95
# Failure threshold
failureCondition: result[0] < 0.90
failureLimit: 3
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
namespace="{{args.namespace}}",
status=~"2.."
}[5m])) /
sum(rate(http_requests_total{
service="{{args.service-name}}",
namespace="{{args.namespace}}"
}[5m]))
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: latency-check
spec:
args:
- name: service-name
metrics:
- name: p99-latency
interval: 1m
count: 3
successCondition: result[0] < 500
failureLimit: 2
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket{
service="{{args.service-name}}"
}[5m])) by (le)
) * 1000
6.4.3 Job Analysis Template
# analysis-template-job.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: integration-test
spec:
args:
- name: test-endpoint
metrics:
- name: integration-tests
provider:
job:
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: test
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
set -e
echo "Running integration tests..."
# Test health check endpoint
curl -f {{args.test-endpoint}}/health || exit 1
# Test API endpoint
curl -f {{args.test-endpoint}}/api/v1/status || exit 1
# Test response time
RESPONSE_TIME=$(curl -o /dev/null -s -w '%{time_total}' {{args.test-endpoint}}/api/v1/ping)
if [ $(echo "$RESPONSE_TIME > 0.5" | bc -l) -eq 1 ]; then
echo "Response time too slow: ${RESPONSE_TIME}s"
exit 1
fi
echo "All tests passed!"
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: smoke-test
spec:
metrics:
- name: smoke-test
provider:
job:
spec:
backoffLimit: 1
template:
spec:
restartPolicy: Never
containers:
- name: smoke
image: python:3.11-slim
command:
- python
- -c
- |
import urllib.request
import json
import sys
endpoints = [
'http://canary-service/health',
'http://canary-service/ready',
]
for url in endpoints:
try:
response = urllib.request.urlopen(url, timeout=10)
if response.status != 200:
print(f"Failed: {url} returned {response.status}")
sys.exit(1)
print(f"OK: {url}")
except Exception as e:
print(f"Error: {url} - {e}")
sys.exit(1)
print("All smoke tests passed!")
6.4.4 Web Analysis Template
# analysis-template-web.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: web-check
spec:
args:
- name: url
metrics:
- name: web-health
interval: 30s
count: 5
successCondition: result.status == 200
failureLimit: 2
provider:
web:
url: "{{args.url}}/health"
headers:
- key: Content-Type
value: application/json
jsonPath: "{$.status}"
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: api-response-check
spec:
args:
- name: api-url
- name: expected-version
metrics:
- name: version-check
provider:
web:
url: "{{args.api-url}}/version"
jsonPath: "{$.version}"
successCondition: result == "{{args.expected-version}}"
6.4.5 Canary Deployment with Analysis Integration
# canary-with-analysis.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-with-analysis
spec:
replicas: 10
selector:
matchLabels:
app: canary-analysis
template:
metadata:
labels:
app: canary-analysis
spec:
containers:
- name: app
image: myapp:v1
ports:
- containerPort: 8080
strategy:
canary:
canaryService: canary-analysis-canary
stableService: canary-analysis-stable
# Analysis configuration
analysis:
templates:
- templateName: success-rate-prometheus
- templateName: latency-check
args:
- name: service-name
value: canary-analysis-canary
# Analysis start step
startingStep: 1 # Start analysis from second step
steps:
- setWeight: 5
- pause: {duration: 30s}
# Analysis starts here
- setWeight: 20
- pause: {duration: 2m}
- setWeight: 50
- pause: {duration: 5m}
- setWeight: 80
- pause: {duration: 5m}
# Failure rollback conditions
maxUnavailable: 1
maxSurge: 1
---
# Using multiple analysis templates
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-multi-analysis
spec:
replicas: 5
selector:
matchLabels:
app: canary-multi
template:
metadata:
labels:
app: canary-multi
spec:
containers:
- name: app
image: myapp:v1
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 1m}
# Inline analysis step
- analysis:
templates:
- templateName: success-rate-prometheus
args:
- name: service-name
value: canary-multi-canary
- setWeight: 50
- pause: {duration: 2m}
- analysis:
templates:
- templateName: latency-check
- templateName: integration-test
args:
- name: service-name
value: canary-multi-canary
- name: test-endpoint
value: http://canary-multi-canary:8080
- setWeight: 100
6.5 Experimentation
6.5.1 Experiment Concept
🔄 正在渲染 Mermaid 图表...
6.5.2 Basic Experiment Configuration
# experiment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Experiment
metadata:
name: experiment-demo
spec:
# Experiment duration
duration: 10m
# Progress deadline
progressDeadlineSeconds: 300
templates:
# Baseline template
- name: baseline
replicas: 1
selector:
matchLabels:
app: experiment-demo
version: baseline
template:
metadata:
labels:
app: experiment-demo
version: baseline
spec:
containers:
- name: app
image: myapp:v1
ports:
- containerPort: 8080
service:
name: experiment-baseline
# Candidate template
- name: candidate
replicas: 1
selector:
matchLabels:
app: experiment-demo
version: candidate
template:
metadata:
labels:
app: experiment-demo
version: candidate
spec:
containers:
- name: app
image: myapp:v2
ports:
- containerPort: 8080
service:
name: experiment-candidate
# Analysis configuration
analyses:
- name: compare-metrics
templateName: compare-analysis
args:
- name: baseline-service
value: experiment-baseline
- name: candidate-service
value: experiment-candidate
6.5.3 Comparison Analysis Template
# compare-analysis-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: compare-analysis
spec:
args:
- name: baseline-service
- name: candidate-service
metrics:
# Compare latency
- name: latency-comparison
interval: 1m
count: 5
successCondition: result[0] <= result[1] * 1.1 # Candidate can't be more than 10% slower than baseline
provider:
prometheus:
address: http://prometheus:9090
query: |
[
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="{{args.candidate-service}}"}[5m])) by (le)),
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="{{args.baseline-service}}"}[5m])) by (le))
]
# Compare error rate
- name: error-rate-comparison
interval: 1m
count: 5
successCondition: result[0] <= result[1] # Candidate error rate can't be higher than baseline
provider:
prometheus:
address: http://prometheus:9090
query: |
[
sum(rate(http_requests_total{service="{{args.candidate-service}}", status=~"5.."}[5m])) / sum(rate(http_requests_total{service="{{args.candidate-service}}"}[5m])),
sum(rate(http_requests_total{service="{{args.baseline-service}}", status=~"5.."}[5m])) / sum(rate(http_requests_total{service="{{args.baseline-service}}"}[5m]))
]
6.5.4 Using Experiments in Rollout
# rollout-with-experiment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollout-experiment
spec:
replicas: 4
selector:
matchLabels:
app: rollout-experiment
template:
metadata:
labels:
app: rollout-experiment
spec:
containers:
- name: app
image: myapp:v1
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 1m}
# Run experiment
- experiment:
duration: 5m
templates:
- name: baseline
specRef: stable
replicas: 2
- name: canary
specRef: canary
replicas: 2
analyses:
- name: performance-comparison
templateName: compare-analysis
args:
- name: baseline-service
value: rollout-experiment-stable
- name: candidate-service
value: rollout-experiment-canary
- setWeight: 50
- pause: {duration: 2m}
- setWeight: 100
6.6 A/B Testing
6.6.1 Header-Based A/B Testing
# ab-testing-header.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: ab-header-test
spec:
replicas: 4
selector:
matchLabels:
app: ab-header-test
template:
metadata:
labels:
app: ab-header-test
spec:
containers:
- name: app
image: myapp:v1
strategy:
canary:
stableService: ab-stable
canaryService: ab-canary
trafficRouting:
nginx:
stableIngress: ab-ingress
annotationPrefix: nginx.ingress.kubernetes.io
additionalIngressAnnotations:
# Route to canary based on header
canary-by-header: X-AB-Test
canary-by-header-value: "variant-b"
steps:
- setHeaderRoute:
name: ab-test-route
match:
- headerName: X-AB-Test
headerValue:
exact: variant-b
- pause: {} # Pause indefinitely, wait for manual confirmation
6.6.2 Cookie-Based A/B Testing
# ab-testing-cookie.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: ab-cookie-test
spec:
replicas: 4
selector:
matchLabels:
app: ab-cookie-test
template:
metadata:
labels:
app: ab-cookie-test
spec:
containers:
- name: app
image: myapp:v1
strategy:
canary:
stableService: ab-cookie-stable
canaryService: ab-cookie-canary
trafficRouting:
istio:
virtualService:
name: ab-cookie-vsvc
routes:
- primary
steps:
- setHeaderRoute:
name: cookie-based-route
match:
- headerName: Cookie
headerValue:
regex: ".*ab_test=variant_b.*"
- pause: {}
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ab-cookie-vsvc
spec:
hosts:
- ab-test.example.com
http:
- name: primary
match:
- headers:
cookie:
regex: ".*ab_test=variant_b.*"
route:
- destination:
host: ab-cookie-canary
# Set sticky cookie
headers:
response:
set:
Set-Cookie: "ab_test=variant_b; Path=/; Max-Age=86400"
- name: default
route:
- destination:
host: ab-cookie-stable
6.6.3 Geographic Routing
# geo-routing.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: geo-routing
spec:
replicas: 4
selector:
matchLabels:
app: geo-routing
template:
metadata:
labels:
app: geo-routing
spec:
containers:
- name: app
image: myapp:v1
strategy:
canary:
stableService: geo-stable
canaryService: geo-canary
trafficRouting:
istio:
virtualService:
name: geo-vsvc
routes:
- primary
steps:
- setHeaderRoute:
name: geo-route
match:
- headerName: X-Geo-Country
headerValue:
exact: US
- pause: {duration: 24h} # Test in US region for 24 hours
- setWeight: 50
- pause: {duration: 12h}
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: geo-vsvc
spec:
hosts:
- myapp.example.com
http:
- name: primary
match:
- headers:
x-geo-country:
exact: US
route:
- destination:
host: geo-canary
- name: default
route:
- destination:
host: geo-stable
6.7 Practical Examples
6.7.1 Complete Production-Grade Canary Deployment
# production-canary.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: production-app
namespace: production
spec:
replicas: 10
revisionHistoryLimit: 5
selector:
matchLabels:
app: production-app
template:
metadata:
labels:
app: production-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: myregistry/production-app:v1.0.0
ports:
- name: http
containerPort: 8080
- name: metrics
containerPort: 9090
env:
- name: VERSION
value: "v1.0.0"
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 15
periodSeconds: 10
strategy:
canary:
stableService: production-app-stable
canaryService: production-app-canary
trafficRouting:
nginx:
stableIngress: production-app-ingress
# Analysis configuration
analysis:
templates:
- templateName: production-analysis
startingStep: 2
args:
- name: service-name
value: production-app-canary
- name: prometheus-url
value: http://prometheus.monitoring:9090
# Deployment steps
steps:
# Canary smoke test
- setWeight: 1
- pause: {duration: 30s}
# Start analysis, small traffic test
- setWeight: 5
- pause: {duration: 2m}
# Increase traffic
- setWeight: 10
- pause: {duration: 5m}
# Medium traffic
- setWeight: 25
- pause: {duration: 5m}
# Larger traffic
- setWeight: 50
- pause: {duration: 10m}
# Most traffic
- setWeight: 75
- pause: {duration: 10m}
# Near full
- setWeight: 90
- pause: {duration: 5m}
# Full traffic
- setWeight: 100
# Anti-affinity
antiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
weight: 100
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: production-analysis
namespace: production
spec:
args:
- name: service-name
- name: prometheus-url
metrics:
# Success rate monitoring
- name: success-rate
interval: 1m
successCondition: result[0] >= 0.99
failureCondition: result[0] < 0.95
failureLimit: 3
provider:
prometheus:
address: "{{args.prometheus-url}}"
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status=~"2.."
}[2m])) /
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[2m]))
# P95 latency monitoring
- name: p95-latency
interval: 1m
successCondition: result[0] < 200
failureCondition: result[0] > 500
failureLimit: 3
provider:
prometheus:
address: "{{args.prometheus-url}}"
query: |
histogram_quantile(0.95,
sum(rate(http_request_duration_milliseconds_bucket{
service="{{args.service-name}}"
}[2m])) by (le)
)
# P99 latency monitoring
- name: p99-latency
interval: 1m
successCondition: result[0] < 500
failureCondition: result[0] > 1000
failureLimit: 3
provider:
prometheus:
address: "{{args.prometheus-url}}"
query: |
histogram_quantile(0.99,
sum(rate(http_request_duration_milliseconds_bucket{
service="{{args.service-name}}"
}[2m])) by (le)
)
# Error rate monitoring
- name: error-rate
interval: 1m
successCondition: result[0] < 0.01
failureCondition: result[0] > 0.05
failureLimit: 2
provider:
prometheus:
address: "{{args.prometheus-url}}"
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status=~"5.."
}[2m])) /
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[2m]))
---
apiVersion: v1
kind: Service
metadata:
name: production-app-stable
namespace: production
spec:
selector:
app: production-app
ports:
- port: 80
targetPort: http
---
apiVersion: v1
kind: Service
metadata:
name: production-app-canary
namespace: production
spec:
selector:
app: production-app
ports:
- port: 80
targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: production-app-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- app.example.com
secretName: app-tls
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: production-app-stable
port:
number: 80
6.7.2 Monitoring and Alerting Configuration
# rollout-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: rollout-alerts
namespace: monitoring
spec:
groups:
- name: argo-rollouts
rules:
# Rollout failure alert
- alert: RolloutFailed
expr: |
kube_rollout_status_phase{phase="Failed"} == 1
for: 1m
labels:
severity: critical
annotations:
summary: "Rollout {{ $labels.rollout }} failed"
description: "Rollout {{ $labels.rollout }} in namespace {{ $labels.namespace }} has failed."
# Analysis run failure alert
- alert: AnalysisRunFailed
expr: |
kube_analysisrun_status_phase{phase="Failed"} == 1
for: 1m
labels:
severity: warning
annotations:
summary: "AnalysisRun failed for {{ $labels.rollout }}"
description: "AnalysisRun for rollout {{ $labels.rollout }} has failed, rollback may be triggered."
# Rollout stalled alert
- alert: RolloutStalled
expr: |
kube_rollout_status_phase{phase="Paused"} == 1
for: 30m
labels:
severity: warning
annotations:
summary: "Rollout {{ $labels.rollout }} is stalled"
description: "Rollout {{ $labels.rollout }} has been paused for more than 30 minutes."
6.7.3 Rollout Operation Command Summary
# List all Rollouts
kubectl argo rollouts list rollouts
# View Rollout detailed status
kubectl argo rollouts get rollout <name>
# Real-time monitoring of Rollout
kubectl argo rollouts get rollout <name> --watch
# Update image
kubectl argo rollouts set image <rollout-name> <container>=<image>:<tag>
# Pause Rollout
kubectl argo rollouts pause <rollout-name>
# Resume Rollout
kubectl argo rollouts resume <rollout-name>
# Manual promotion (skip all pause steps)
kubectl argo rollouts promote <rollout-name>
# Full promotion (skip all steps and analysis)
kubectl argo rollouts promote <rollout-name> --full
# Abort Rollout
kubectl argo rollouts abort <rollout-name>
# Rollback to previous version
kubectl argo rollouts undo <rollout-name>
# Rollback to specific revision
kubectl argo rollouts undo <rollout-name> --to-revision=2
# Retry failed Rollout
kubectl argo rollouts retry rollout <rollout-name>
# View Rollout history
kubectl argo rollouts history <rollout-name>
# Launch Dashboard
kubectl argo rollouts dashboard
6.8 Chapter Summary
This chapter provided a detailed introduction to Argo Rollouts’ progressive delivery capabilities:
🔄 正在渲染 Mermaid 图表...
Key Points:
- Blue-Green Deployment is suitable for scenarios requiring fast switching and rollback
- Canary Deployment is suitable for gradual validation of new versions
- Automated Analysis can automatically decide to continue release or rollback based on metrics
- Experimentation supports simultaneous comparison testing of multiple versions
- Traffic Management supports various ingress controllers and service meshes
In the next chapter, we will learn about Argo Events to understand how to implement event-driven workflows.