Kubernetes Pod Disruption Budgets on EKS: Zero-Downtime Upgrades
Quick summary: Cluster upgrades and Karpenter consolidation look healthy in the console while PDB-blocked evictions freeze your node drain for 45 minutes. This guide wires minAvailable, maxUnavailable, and EKS managed node group semantics.
Key Takeaways
- Cluster upgrades and Karpenter consolidation look healthy in the console while PDB-blocked evictions freeze your node drain for 45 minutes
- This guide wires minAvailable, maxUnavailable, and EKS managed node group semantics
- EKS (June 2026) control plane upgrades are managed, but worker disruption is yours
- is the API that tells and Karpenter how many pods may disappear during voluntary evictions
- Field note — FinTech API on EKS (5 replicas, copied from prod HA doc): node group upgrade stalled 52 min waiting for impossible eviction
Table of Contents
EKS (June 2026) control plane upgrades are managed, but worker disruption is yours. PodDisruptionBudget is the API that tells kubectl drain and Karpenter how many pods may disappear during voluntary evictions.
Symptom → mechanism → AWS control
| Production symptom | Mechanism | AWS control |
|---|---|---|
| 503s during node drain | Voluntary disruption evicts all pods | PodDisruptionBudget minAvailable or maxUnavailable |
| PDB blocks node upgrade indefinitely | Too-strict minAvailable=100% | minAvailable=80% with HPA headroom |
| Single-replica deploy has no PDB effect | PDB requires ≥2 replicas | HPA minReplicas=2 for production tiers |
Opinionated take: Every production Deployment needs a PDB and minReplicas≥2—EKS managed node upgrades will evict your pods whether you’re ready or not.
Benchmark pattern (hypothetical workload) — EKS node group upgrade without PDB: 18s API outage; with PDB minAvailable=80% on 10-replica Deployment: 0 failed requests during 15-min rolling node drain; cluster-autoscaler respects PDB evictions.
Field note — FinTech API on EKS (5 replicas,
minAvailable: 5copied from prod HA doc): node group upgrade stalled 52 min waiting for impossible eviction. Changing tomaxUnavailable: 1allowed rolling drain; error rate stayed <0.01%. Pair with blue-green decision guide.
PDB mechanics
- Voluntary disruptions: node drain,
kubectl delete pod, Karpenter consolidation. - Involuntary: hardware failure, spot interrupt—PDB does not block.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: api
EKS-specific coupling
| Event | PDB interaction |
|---|---|
| Managed node group rolling update | Sequential node replacement; PDB gates pod eviction |
| Karpenter drift / consolidation | Evictions must satisfy PDB |
| Fargate | No DaemonSets; PDB still applies to Fargate pods |
AWS services map
| Need | Service | Skip when |
|---|---|---|
| Managed K8s upgrades | EKS managed node groups | Self-managed ASG with manual drain |
| Disruption control | PDB + EKS Pod Identity | Single-replica dev namespaces |
| Surge during deploy | Deployment maxSurge=25% | StatefulSet with strict ordering |
When this advice breaks
- Single-replica Deployments — PDB cannot invent HA; fix replica count first.
- Jobs/CronJobs — PDB usually irrelevant.
What to do this week
- Audit workloads with
kubectl get pdb -Aand replica counts. - Replace
minAvailable: 100%withmaxUnavailable: 1for rolling services. - Run controlled drain on one node during low traffic; watch
kube_pod_status_ready. - Align cluster upgrade window with Karpenter how-to.
More in This Track
Part of the Engineering Guides library (June 2026).
- Previous: Part 1
- Next: Part 3
- Browse tracks: Engineering Guides hub
What this guide doesn’t cover
Service mesh traffic shifting—part 3 of this track.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.