Container Runtime Security: seccomp, AppArmor, and EKS Pod Security
Default Docker seccomp is not enough for regulated workloads. EKS Pod Security Standards, seccomp profiles, and Fargate platform version constraints.
Default Docker seccomp is not enough for regulated workloads. EKS Pod Security Standards, seccomp profiles, and Fargate platform version constraints.
Cluster upgrades and Karpenter consolidation look healthy in the console while PDB-blocked evictions freeze your node drain for 45 minutes. This guide wires minAvailable, maxUnavailable, and EKS managed node group semantics.
Ingesting every debug log to CloudWatch is how observability becomes a FinOps incident. Tail sampling with ADOT, Logs Insights, and Firehose to S3 for the long tail.
That `user_id` label on every HTTP metric turns Amazon Managed Prometheus into a five-figure line item. This guide explains cardinality mechanics, EMF vs remote write, and Application Signals defaults worth disabling.
App Mesh is legacy path—new meshes should start with VPC Lattice for AWS-native east-west or Istio on EKS when you need full L7 policy. Traffic shifting without duplicating load balancers per service.
Generic DevOps maturity models score you on culture slides — this one maps L1–L4 to AWS gates you can verify: IaC in Git, GitOps or gated CD, ADOT on EKS, FIS with stop conditions, and cost-aware CI. A composite 85-engineer SaaS moved from L2 to L3 in one quarter by fixing the CI/GitOps boundary alone, cutting deploy-related incidents from ~6/month to 2.
Running one AWS FIS experiment in a demo account is not chaos engineering — it is a screenshot. A program ties experiments to SLOs, scopes blast radius with tags, halts on CloudWatch alarm stop conditions, schedules via EventBridge, and closes the loop by re-testing the fix. FIS now ships AZ Power Interruption and cross-Region connectivity scenarios in its Scenario Library. Here is the L0→L3 maturity matrix, a GameDay runbook, and a stop-condition-wired experiment skeleton.
AWS Prescriptive Guidance says Argo CD and Flux both handle most GitOps scenarios capably — so picking one is a fit decision, not a winner. The decisions that actually cause incidents are the ones underneath: plaintext secrets in the GitOps repo, CI running kubectl apply and reintroducing drift, no App-of-Apps so onboarding is click-ops, and repo topology you can't change later. Here is the Argo CD vs Flux matrix, an App-of-Apps example, and the five traps independent of tool.
The reflex to bolt Amazon Managed Prometheus + Grafana onto every workload is how observability bills quietly double. CloudWatch Application Signals now gives you an auto-discovered service map, SLOs, and traces with near-zero setup; AMP only earns its keep when you are PromQL-native or drowning in high-cardinality metrics — where ingestion (not retention) is the cost driver. Here is the decision matrix, an ADOT dual-export config, and the three levers that actually cut the AMP bill.
ECS CodeDeploy and Lambda aliases support both instant cutover and gradual shifts—but picking wrong costs you double Fargate spend or 21-day MTTR on muted alarms. This decision guide scores blue/green, canary, and rolling with a matrix and names App Mesh (EOL Sept 30, 2026) replacements.
Every Terraform command you actually need on AWS — modernized for Terraform 1.10+, with deprecated commands flagged and AWS-specific gotchas for state, workspaces, providers, and the new import/removed/ephemeral primitives.
Anton Babenko's Terraform Claude Skill is the biggest jump in AI-assisted IaC since Copilot. We tested it on a real AWS stack — VPC, EKS, S3 + KMS, IAM — and documented exactly what it fixes, what it misses, and what AWS teams should layer on top.