Zero-downtime deploys checklist

One-page checklist for zero-downtime deploys.

  1. Confirm service mesh/ingress supports weighted routing and is configured in both blue and green environments.
  2. Validate Helm chart: readiness/liveness probes defined, config values stored in Git, image tag immutable, and chart version bumped.
  3. Sync ArgoCD blue and green apps to the same baseline before introducing a new release.
  4. Execute automated smoke, contract, and migration verification suites against green; block promotion on failures.
  5. Annotate deploy in observability platform; pre-load dashboards for latency, error rate, and saturation.
  6. Shift traffic following agreed weights (e.g., 10% → 30% → 60% → 100%) with hold times defined for each stage.
  7. Monitor SLOs and business KPIs during each weight step; abort and roll back automatically if thresholds are exceeded.
  8. Once green is stable at 100%, decommission or repurpose blue resources per cost policy while retaining snapshots/logs.
  9. Capture deployment notes, metrics, and follow-up items in the release log; schedule improvement actions with owners.

Pitfalls

  • Forgetting to mirror feature flags, secrets, or third-party callbacks in green.
  • Lacking automated rollback scripts, forcing manual kubectl commands during stress.
  • Shipping incompatible database migrations that block blue from staying live.

Need help hardening zero-downtime pipelines? Book a working session via /contact.