Chapter 6: Argo Rollouts Progressive Delivery

Deep dive into Argo Rollouts canary deployments, blue-green deployments, A/B testing and other progressive delivery strategies

作者
36min

Argo Rollouts Progressive Delivery

Chapter 6: Implementing Safe Application Releases with Argo Rollouts

Argo Rollouts is a Kubernetes controller that provides advanced deployment capabilities such as blue-green deployments, canary deployments, canary analysis, experimentation, and progressive delivery features.

6.1 Argo Rollouts Overview

6.1.1 Why Progressive Delivery is Needed

🔄 正在渲染 Mermaid 图表...

6.1.2 Argo Rollouts Core Concepts

🔄 正在渲染 Mermaid 图表...

6.1.3 Installing Argo Rollouts

# Install Argo Rollouts
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# Install kubectl plugin (optional but recommended)
# macOS
brew install argoproj/tap/kubectl-argo-rollouts

# Linux
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

# Verify installation
kubectl argo rollouts version

6.2 Blue-Green Deployment

6.2.1 Blue-Green Deployment Principles

🔄 正在渲染 Mermaid 图表...

6.2.2 Blue-Green Deployment Configuration

# blue-green-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: blue-green-demo
spec:
  replicas: 3
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: blue-green-demo
  template:
    metadata:
      labels:
        app: blue-green-demo
    spec:
      containers:
      - name: app
        image: nginx:1.24
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
  strategy:
    blueGreen:
      # Active service (production traffic)
      activeService: blue-green-active
      # Preview service (for testing new version)
      previewService: blue-green-preview
      # Auto promotion (set to false for manual confirmation)
      autoPromotionEnabled: false
      # Time to keep old version after switch
      scaleDownDelaySeconds: 30
      # Preview replica count after ready
      previewReplicaCount: 1
---
# Active service
apiVersion: v1
kind: Service
metadata:
  name: blue-green-active
spec:
  selector:
    app: blue-green-demo
  ports:
  - port: 80
    targetPort: 80
---
# Preview service
apiVersion: v1
kind: Service
metadata:
  name: blue-green-preview
spec:
  selector:
    app: blue-green-demo
  ports:
  - port: 80
    targetPort: 80

6.2.3 Blue-Green Deployment Operations

# Deploy application
kubectl apply -f blue-green-rollout.yaml

# View Rollout status
kubectl argo rollouts get rollout blue-green-demo

# Update image to trigger deployment
kubectl argo rollouts set image blue-green-demo app=nginx:1.25

# Watch deployment progress (real-time monitoring)
kubectl argo rollouts get rollout blue-green-demo --watch

# Manual promotion
kubectl argo rollouts promote blue-green-demo

# Rollback to previous version
kubectl argo rollouts undo blue-green-demo

# Abort deployment
kubectl argo rollouts abort blue-green-demo

6.2.4 Advanced Blue-Green Configuration

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: blue-green-advanced
spec:
  replicas: 5
  selector:
    matchLabels:
      app: blue-green-advanced
  template:
    metadata:
      labels:
        app: blue-green-advanced
    spec:
      containers:
      - name: app
        image: myapp:v1
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
  strategy:
    blueGreen:
      activeService: bg-active-svc
      previewService: bg-preview-svc
      # Auto promotion configuration
      autoPromotionEnabled: true
      autoPromotionSeconds: 60  # Auto promote after 60 seconds
      # Preview replica count
      previewReplicaCount: 2
      # Scale down delay
      scaleDownDelaySeconds: 60
      # Scale down revision limit
      scaleDownDelayRevisionLimit: 2
      # Anti-affinity configuration
      antiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution: {}
      # Pre-promotion analysis
      prePromotionAnalysis:
        templates:
        - templateName: success-rate-check
        args:
        - name: service-name
          value: bg-preview-svc
      # Post-promotion analysis
      postPromotionAnalysis:
        templates:
        - templateName: error-rate-check
        args:
        - name: service-name
          value: bg-active-svc

6.3 Canary Deployment

6.3.1 Canary Deployment Principles

🔄 正在渲染 Mermaid 图表...

6.3.2 Basic Canary Configuration

# canary-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-demo
spec:
  replicas: 10
  selector:
    matchLabels:
      app: canary-demo
  template:
    metadata:
      labels:
        app: canary-demo
    spec:
      containers:
      - name: app
        image: nginx:1.24
        ports:
        - containerPort: 80
  strategy:
    canary:
      # Canary service (optional, for direct access to canary version)
      canaryService: canary-demo-canary
      # Stable service
      stableService: canary-demo-stable
      # Deployment steps
      steps:
      # Step 1: 5% traffic
      - setWeight: 5
      # Pause for manual confirmation or timeout
      - pause: {duration: 1m}
      # Step 2: 20% traffic
      - setWeight: 20
      - pause: {duration: 2m}
      # Step 3: 50% traffic
      - setWeight: 50
      - pause: {duration: 5m}
      # Step 4: 80% traffic
      - setWeight: 80
      - pause: {duration: 5m}
      # Auto switch to 100% after completion
---
apiVersion: v1
kind: Service
metadata:
  name: canary-demo-stable
spec:
  selector:
    app: canary-demo
  ports:
  - port: 80
---
apiVersion: v1
kind: Service
metadata:
  name: canary-demo-canary
spec:
  selector:
    app: canary-demo
  ports:
  - port: 80

6.3.3 Canary Deployment with Traffic Management

# canary-with-traffic-management.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-nginx
spec:
  replicas: 5
  selector:
    matchLabels:
      app: canary-nginx
  template:
    metadata:
      labels:
        app: canary-nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.24
        ports:
        - containerPort: 80
  strategy:
    canary:
      canaryService: canary-nginx-canary
      stableService: canary-nginx-stable
      # Nginx Ingress traffic management
      trafficRouting:
        nginx:
          stableIngress: canary-nginx-ingress
          annotationPrefix: nginx.ingress.kubernetes.io
          additionalIngressAnnotations:
            canary-by-header: X-Canary
            canary-by-header-value: "true"
      steps:
      - setWeight: 5
      - pause: {duration: 30s}
      - setWeight: 20
      - pause: {duration: 1m}
      - setWeight: 50
      - pause: {duration: 2m}
      - setWeight: 80
      - pause: {duration: 2m}
---
# Stable Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: canary-nginx-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: canary-demo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: canary-nginx-stable
            port:
              number: 80

6.3.4 Istio Traffic Management Integration

# canary-with-istio.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-istio-demo
spec:
  replicas: 5
  selector:
    matchLabels:
      app: canary-istio-demo
  template:
    metadata:
      labels:
        app: canary-istio-demo
    spec:
      containers:
      - name: app
        image: myapp:v1
        ports:
        - containerPort: 8080
  strategy:
    canary:
      canaryService: canary-istio-demo-canary
      stableService: canary-istio-demo-stable
      trafficRouting:
        istio:
          virtualService:
            name: canary-istio-demo-vsvc
            routes:
            - primary
          destinationRule:
            name: canary-istio-demo-destrule
            canarySubsetName: canary
            stableSubsetName: stable
      steps:
      - setWeight: 10
      - pause: {duration: 1m}
      - setWeight: 30
      - pause: {duration: 2m}
      - setWeight: 60
      - pause: {duration: 3m}
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: canary-istio-demo-vsvc
spec:
  hosts:
  - canary-istio-demo.example.com
  http:
  - name: primary
    route:
    - destination:
        host: canary-istio-demo-stable
      weight: 100
    - destination:
        host: canary-istio-demo-canary
      weight: 0
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: canary-istio-demo-destrule
spec:
  host: canary-istio-demo
  subsets:
  - name: stable
    labels:
      app: canary-istio-demo
  - name: canary
    labels:
      app: canary-istio-demo

6.4 Analysis and Automated Rollback

6.4.1 AnalysisTemplate Concept

🔄 正在渲染 Mermaid 图表...

6.4.2 Prometheus Analysis Template

# analysis-template-prometheus.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate-prometheus
spec:
  args:
  - name: service-name
  - name: namespace
    value: default
  metrics:
  - name: success-rate
    # Check every 30 seconds
    interval: 30s
    # At least 5 measurements needed
    count: 5
    # Success threshold
    successCondition: result[0] >= 0.95
    # Failure threshold
    failureCondition: result[0] < 0.90
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          sum(rate(http_requests_total{
            service="{{args.service-name}}",
            namespace="{{args.namespace}}",
            status=~"2.."
          }[5m])) /
          sum(rate(http_requests_total{
            service="{{args.service-name}}",
            namespace="{{args.namespace}}"
          }[5m]))
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: latency-check
spec:
  args:
  - name: service-name
  metrics:
  - name: p99-latency
    interval: 1m
    count: 3
    successCondition: result[0] < 500
    failureLimit: 2
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket{
              service="{{args.service-name}}"
            }[5m])) by (le)
          ) * 1000

6.4.3 Job Analysis Template

# analysis-template-job.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: integration-test
spec:
  args:
  - name: test-endpoint
  metrics:
  - name: integration-tests
    provider:
      job:
        spec:
          backoffLimit: 0
          template:
            spec:
              restartPolicy: Never
              containers:
              - name: test
                image: curlimages/curl:latest
                command:
                - /bin/sh
                - -c
                - |
                  set -e
                  echo "Running integration tests..."

                  # Test health check endpoint
                  curl -f {{args.test-endpoint}}/health || exit 1

                  # Test API endpoint
                  curl -f {{args.test-endpoint}}/api/v1/status || exit 1

                  # Test response time
                  RESPONSE_TIME=$(curl -o /dev/null -s -w '%{time_total}' {{args.test-endpoint}}/api/v1/ping)
                  if [ $(echo "$RESPONSE_TIME > 0.5" | bc -l) -eq 1 ]; then
                    echo "Response time too slow: ${RESPONSE_TIME}s"
                    exit 1
                  fi

                  echo "All tests passed!"
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: smoke-test
spec:
  metrics:
  - name: smoke-test
    provider:
      job:
        spec:
          backoffLimit: 1
          template:
            spec:
              restartPolicy: Never
              containers:
              - name: smoke
                image: python:3.11-slim
                command:
                - python
                - -c
                - |
                  import urllib.request
                  import json
                  import sys

                  endpoints = [
                      'http://canary-service/health',
                      'http://canary-service/ready',
                  ]

                  for url in endpoints:
                      try:
                          response = urllib.request.urlopen(url, timeout=10)
                          if response.status != 200:
                              print(f"Failed: {url} returned {response.status}")
                              sys.exit(1)
                          print(f"OK: {url}")
                      except Exception as e:
                          print(f"Error: {url} - {e}")
                          sys.exit(1)

                  print("All smoke tests passed!")

6.4.4 Web Analysis Template

# analysis-template-web.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: web-check
spec:
  args:
  - name: url
  metrics:
  - name: web-health
    interval: 30s
    count: 5
    successCondition: result.status == 200
    failureLimit: 2
    provider:
      web:
        url: "{{args.url}}/health"
        headers:
        - key: Content-Type
          value: application/json
        jsonPath: "{$.status}"
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: api-response-check
spec:
  args:
  - name: api-url
  - name: expected-version
  metrics:
  - name: version-check
    provider:
      web:
        url: "{{args.api-url}}/version"
        jsonPath: "{$.version}"
    successCondition: result == "{{args.expected-version}}"

6.4.5 Canary Deployment with Analysis Integration

# canary-with-analysis.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-with-analysis
spec:
  replicas: 10
  selector:
    matchLabels:
      app: canary-analysis
  template:
    metadata:
      labels:
        app: canary-analysis
    spec:
      containers:
      - name: app
        image: myapp:v1
        ports:
        - containerPort: 8080
  strategy:
    canary:
      canaryService: canary-analysis-canary
      stableService: canary-analysis-stable
      # Analysis configuration
      analysis:
        templates:
        - templateName: success-rate-prometheus
        - templateName: latency-check
        args:
        - name: service-name
          value: canary-analysis-canary
        # Analysis start step
        startingStep: 1  # Start analysis from second step
      steps:
      - setWeight: 5
      - pause: {duration: 30s}
      # Analysis starts here
      - setWeight: 20
      - pause: {duration: 2m}
      - setWeight: 50
      - pause: {duration: 5m}
      - setWeight: 80
      - pause: {duration: 5m}
      # Failure rollback conditions
      maxUnavailable: 1
      maxSurge: 1
---
# Using multiple analysis templates
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-multi-analysis
spec:
  replicas: 5
  selector:
    matchLabels:
      app: canary-multi
  template:
    metadata:
      labels:
        app: canary-multi
    spec:
      containers:
      - name: app
        image: myapp:v1
  strategy:
    canary:
      steps:
      - setWeight: 10
      - pause: {duration: 1m}
      # Inline analysis step
      - analysis:
          templates:
          - templateName: success-rate-prometheus
          args:
          - name: service-name
            value: canary-multi-canary
      - setWeight: 50
      - pause: {duration: 2m}
      - analysis:
          templates:
          - templateName: latency-check
          - templateName: integration-test
          args:
          - name: service-name
            value: canary-multi-canary
          - name: test-endpoint
            value: http://canary-multi-canary:8080
      - setWeight: 100

6.5 Experimentation

6.5.1 Experiment Concept

🔄 正在渲染 Mermaid 图表...

6.5.2 Basic Experiment Configuration

# experiment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Experiment
metadata:
  name: experiment-demo
spec:
  # Experiment duration
  duration: 10m
  # Progress deadline
  progressDeadlineSeconds: 300
  templates:
  # Baseline template
  - name: baseline
    replicas: 1
    selector:
      matchLabels:
        app: experiment-demo
        version: baseline
    template:
      metadata:
        labels:
          app: experiment-demo
          version: baseline
      spec:
        containers:
        - name: app
          image: myapp:v1
          ports:
          - containerPort: 8080
    service:
      name: experiment-baseline
  # Candidate template
  - name: candidate
    replicas: 1
    selector:
      matchLabels:
        app: experiment-demo
        version: candidate
    template:
      metadata:
        labels:
          app: experiment-demo
          version: candidate
      spec:
        containers:
        - name: app
          image: myapp:v2
          ports:
          - containerPort: 8080
    service:
      name: experiment-candidate
  # Analysis configuration
  analyses:
  - name: compare-metrics
    templateName: compare-analysis
    args:
    - name: baseline-service
      value: experiment-baseline
    - name: candidate-service
      value: experiment-candidate

6.5.3 Comparison Analysis Template

# compare-analysis-template.yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: compare-analysis
spec:
  args:
  - name: baseline-service
  - name: candidate-service
  metrics:
  # Compare latency
  - name: latency-comparison
    interval: 1m
    count: 5
    successCondition: result[0] <= result[1] * 1.1  # Candidate can't be more than 10% slower than baseline
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          [
            histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="{{args.candidate-service}}"}[5m])) by (le)),
            histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="{{args.baseline-service}}"}[5m])) by (le))
          ]
  # Compare error rate
  - name: error-rate-comparison
    interval: 1m
    count: 5
    successCondition: result[0] <= result[1]  # Candidate error rate can't be higher than baseline
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          [
            sum(rate(http_requests_total{service="{{args.candidate-service}}", status=~"5.."}[5m])) / sum(rate(http_requests_total{service="{{args.candidate-service}}"}[5m])),
            sum(rate(http_requests_total{service="{{args.baseline-service}}", status=~"5.."}[5m])) / sum(rate(http_requests_total{service="{{args.baseline-service}}"}[5m]))
          ]

6.5.4 Using Experiments in Rollout

# rollout-with-experiment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-experiment
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-experiment
  template:
    metadata:
      labels:
        app: rollout-experiment
    spec:
      containers:
      - name: app
        image: myapp:v1
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 1m}
      # Run experiment
      - experiment:
          duration: 5m
          templates:
          - name: baseline
            specRef: stable
            replicas: 2
          - name: canary
            specRef: canary
            replicas: 2
          analyses:
          - name: performance-comparison
            templateName: compare-analysis
            args:
            - name: baseline-service
              value: rollout-experiment-stable
            - name: candidate-service
              value: rollout-experiment-canary
      - setWeight: 50
      - pause: {duration: 2m}
      - setWeight: 100

6.6 A/B Testing

6.6.1 Header-Based A/B Testing

# ab-testing-header.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: ab-header-test
spec:
  replicas: 4
  selector:
    matchLabels:
      app: ab-header-test
  template:
    metadata:
      labels:
        app: ab-header-test
    spec:
      containers:
      - name: app
        image: myapp:v1
  strategy:
    canary:
      stableService: ab-stable
      canaryService: ab-canary
      trafficRouting:
        nginx:
          stableIngress: ab-ingress
          annotationPrefix: nginx.ingress.kubernetes.io
          additionalIngressAnnotations:
            # Route to canary based on header
            canary-by-header: X-AB-Test
            canary-by-header-value: "variant-b"
      steps:
      - setHeaderRoute:
          name: ab-test-route
          match:
          - headerName: X-AB-Test
            headerValue:
              exact: variant-b
      - pause: {}  # Pause indefinitely, wait for manual confirmation
# ab-testing-cookie.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: ab-cookie-test
spec:
  replicas: 4
  selector:
    matchLabels:
      app: ab-cookie-test
  template:
    metadata:
      labels:
        app: ab-cookie-test
    spec:
      containers:
      - name: app
        image: myapp:v1
  strategy:
    canary:
      stableService: ab-cookie-stable
      canaryService: ab-cookie-canary
      trafficRouting:
        istio:
          virtualService:
            name: ab-cookie-vsvc
            routes:
            - primary
      steps:
      - setHeaderRoute:
          name: cookie-based-route
          match:
          - headerName: Cookie
            headerValue:
              regex: ".*ab_test=variant_b.*"
      - pause: {}
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ab-cookie-vsvc
spec:
  hosts:
  - ab-test.example.com
  http:
  - name: primary
    match:
    - headers:
        cookie:
          regex: ".*ab_test=variant_b.*"
    route:
    - destination:
        host: ab-cookie-canary
    # Set sticky cookie
    headers:
      response:
        set:
          Set-Cookie: "ab_test=variant_b; Path=/; Max-Age=86400"
  - name: default
    route:
    - destination:
        host: ab-cookie-stable

6.6.3 Geographic Routing

# geo-routing.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: geo-routing
spec:
  replicas: 4
  selector:
    matchLabels:
      app: geo-routing
  template:
    metadata:
      labels:
        app: geo-routing
    spec:
      containers:
      - name: app
        image: myapp:v1
  strategy:
    canary:
      stableService: geo-stable
      canaryService: geo-canary
      trafficRouting:
        istio:
          virtualService:
            name: geo-vsvc
            routes:
            - primary
      steps:
      - setHeaderRoute:
          name: geo-route
          match:
          - headerName: X-Geo-Country
            headerValue:
              exact: US
      - pause: {duration: 24h}  # Test in US region for 24 hours
      - setWeight: 50
      - pause: {duration: 12h}
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: geo-vsvc
spec:
  hosts:
  - myapp.example.com
  http:
  - name: primary
    match:
    - headers:
        x-geo-country:
          exact: US
    route:
    - destination:
        host: geo-canary
  - name: default
    route:
    - destination:
        host: geo-stable

6.7 Practical Examples

6.7.1 Complete Production-Grade Canary Deployment

# production-canary.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: production-app
  namespace: production
spec:
  replicas: 10
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: production-app
  template:
    metadata:
      labels:
        app: production-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        image: myregistry/production-app:v1.0.0
        ports:
        - name: http
          containerPort: 8080
        - name: metrics
          containerPort: 9090
        env:
        - name: VERSION
          value: "v1.0.0"
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 15
          periodSeconds: 10
  strategy:
    canary:
      stableService: production-app-stable
      canaryService: production-app-canary
      trafficRouting:
        nginx:
          stableIngress: production-app-ingress
      # Analysis configuration
      analysis:
        templates:
        - templateName: production-analysis
        startingStep: 2
        args:
        - name: service-name
          value: production-app-canary
        - name: prometheus-url
          value: http://prometheus.monitoring:9090
      # Deployment steps
      steps:
      # Canary smoke test
      - setWeight: 1
      - pause: {duration: 30s}
      # Start analysis, small traffic test
      - setWeight: 5
      - pause: {duration: 2m}
      # Increase traffic
      - setWeight: 10
      - pause: {duration: 5m}
      # Medium traffic
      - setWeight: 25
      - pause: {duration: 5m}
      # Larger traffic
      - setWeight: 50
      - pause: {duration: 10m}
      # Most traffic
      - setWeight: 75
      - pause: {duration: 10m}
      # Near full
      - setWeight: 90
      - pause: {duration: 5m}
      # Full traffic
      - setWeight: 100
      # Anti-affinity
      antiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          weight: 100
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: production-analysis
  namespace: production
spec:
  args:
  - name: service-name
  - name: prometheus-url
  metrics:
  # Success rate monitoring
  - name: success-rate
    interval: 1m
    successCondition: result[0] >= 0.99
    failureCondition: result[0] < 0.95
    failureLimit: 3
    provider:
      prometheus:
        address: "{{args.prometheus-url}}"
        query: |
          sum(rate(http_requests_total{
            service="{{args.service-name}}",
            status=~"2.."
          }[2m])) /
          sum(rate(http_requests_total{
            service="{{args.service-name}}"
          }[2m]))
  # P95 latency monitoring
  - name: p95-latency
    interval: 1m
    successCondition: result[0] < 200
    failureCondition: result[0] > 500
    failureLimit: 3
    provider:
      prometheus:
        address: "{{args.prometheus-url}}"
        query: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_milliseconds_bucket{
              service="{{args.service-name}}"
            }[2m])) by (le)
          )
  # P99 latency monitoring
  - name: p99-latency
    interval: 1m
    successCondition: result[0] < 500
    failureCondition: result[0] > 1000
    failureLimit: 3
    provider:
      prometheus:
        address: "{{args.prometheus-url}}"
        query: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_milliseconds_bucket{
              service="{{args.service-name}}"
            }[2m])) by (le)
          )
  # Error rate monitoring
  - name: error-rate
    interval: 1m
    successCondition: result[0] < 0.01
    failureCondition: result[0] > 0.05
    failureLimit: 2
    provider:
      prometheus:
        address: "{{args.prometheus-url}}"
        query: |
          sum(rate(http_requests_total{
            service="{{args.service-name}}",
            status=~"5.."
          }[2m])) /
          sum(rate(http_requests_total{
            service="{{args.service-name}}"
          }[2m]))
---
apiVersion: v1
kind: Service
metadata:
  name: production-app-stable
  namespace: production
spec:
  selector:
    app: production-app
  ports:
  - port: 80
    targetPort: http
---
apiVersion: v1
kind: Service
metadata:
  name: production-app-canary
  namespace: production
spec:
  selector:
    app: production-app
  ports:
  - port: 80
    targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-app-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: production-app-stable
            port:
              number: 80

6.7.2 Monitoring and Alerting Configuration

# rollout-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: rollout-alerts
  namespace: monitoring
spec:
  groups:
  - name: argo-rollouts
    rules:
    # Rollout failure alert
    - alert: RolloutFailed
      expr: |
        kube_rollout_status_phase{phase="Failed"} == 1
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Rollout {{ $labels.rollout }} failed"
        description: "Rollout {{ $labels.rollout }} in namespace {{ $labels.namespace }} has failed."

    # Analysis run failure alert
    - alert: AnalysisRunFailed
      expr: |
        kube_analysisrun_status_phase{phase="Failed"} == 1
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "AnalysisRun failed for {{ $labels.rollout }}"
        description: "AnalysisRun for rollout {{ $labels.rollout }} has failed, rollback may be triggered."

    # Rollout stalled alert
    - alert: RolloutStalled
      expr: |
        kube_rollout_status_phase{phase="Paused"} == 1
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Rollout {{ $labels.rollout }} is stalled"
        description: "Rollout {{ $labels.rollout }} has been paused for more than 30 minutes."

6.7.3 Rollout Operation Command Summary

# List all Rollouts
kubectl argo rollouts list rollouts

# View Rollout detailed status
kubectl argo rollouts get rollout <name>

# Real-time monitoring of Rollout
kubectl argo rollouts get rollout <name> --watch

# Update image
kubectl argo rollouts set image <rollout-name> <container>=<image>:<tag>

# Pause Rollout
kubectl argo rollouts pause <rollout-name>

# Resume Rollout
kubectl argo rollouts resume <rollout-name>

# Manual promotion (skip all pause steps)
kubectl argo rollouts promote <rollout-name>

# Full promotion (skip all steps and analysis)
kubectl argo rollouts promote <rollout-name> --full

# Abort Rollout
kubectl argo rollouts abort <rollout-name>

# Rollback to previous version
kubectl argo rollouts undo <rollout-name>

# Rollback to specific revision
kubectl argo rollouts undo <rollout-name> --to-revision=2

# Retry failed Rollout
kubectl argo rollouts retry rollout <rollout-name>

# View Rollout history
kubectl argo rollouts history <rollout-name>

# Launch Dashboard
kubectl argo rollouts dashboard

6.8 Chapter Summary

This chapter provided a detailed introduction to Argo Rollouts’ progressive delivery capabilities:

🔄 正在渲染 Mermaid 图表...

Key Points:

  1. Blue-Green Deployment is suitable for scenarios requiring fast switching and rollback
  2. Canary Deployment is suitable for gradual validation of new versions
  3. Automated Analysis can automatically decide to continue release or rollback based on metrics
  4. Experimentation supports simultaneous comparison testing of multiple versions
  5. Traffic Management supports various ingress controllers and service meshes

In the next chapter, we will learn about Argo Events to understand how to implement event-driven workflows.