High-Concurrency Server I/O: epoll, Syscalls, and Zero-Copy on AWS EC2
Quick summary: C10k is solved until syscall overhead and context switches eat your Graviton cores. epoll, sendfile, and SO_REUSEPORT behaviors on EC2—and why Lambda caps concurrency differently.
Key Takeaways
- C10k is solved until syscall overhead and context switches eat your Graviton cores
- epoll, sendfile, and SO_REUSEPORT behaviors on EC2—and why Lambda caps concurrency differently
- Linux on EC2 (June 2026) serves most self-hosted APIs
- Benchmark pattern (hypothetical workload) — c6gn
- 2xlarge (ENA enhanced) running epoll edge-triggered server, 48K concurrent connections, 0
Table of Contents
Linux on EC2 (June 2026) serves most self-hosted APIs. Event loops use epoll (edge-triggered) to watch thousands of sockets per thread—contrast with kqueue on BSD/macOS dev laptops; production parity tests matter.
Symptom → mechanism → AWS control
| Production symptom | Mechanism | AWS control |
|---|---|---|
| CPU pegged at moderate QPS | Excessive syscall churn (read/write per chunk) | sendfile/splice zero-copy, larger socket buffers |
| Connection accept backlog | Single-threaded accept bottleneck | SO_REUSEPORT multi-listener, ALB pre-connection scaling |
| epoll thundering herd | All workers wake on single event | EPOLLET edge-triggered + per-core accept threads |
Opinionated take: Profile syscalls before rewriting in Rust—on AWS, ALB + ENA instances with epoll edge-triggering handles most concurrency without kernel bypass.
Benchmark pattern (hypothetical workload) — c6gn.2xlarge (ENA enhanced) running epoll edge-triggered server, 48K concurrent connections, 0.8ms p99 accept-to-response, sendfile zero-copy reduces CPU 22% vs buffered I/O at 12K req/sec.
Syscall and context switch tax
Each read/write crossing user/kernel boundary costs CPU. Zero-copy sendfile() moves file → socket without user-space buffers—ideal for static assets behind NGINX on EC2 (often superseded by CloudFront).
| Pattern | Benefit on AWS |
|---|---|
| epoll + non-blocking I/O | Few threads, many connections on c7g Graviton |
| SO_REUSEPORT | Accept queue spread across workers |
| sendfile | Static file serving from EBS |
Lambda avoids your epoll tuning—concurrency is execution environment reuse with platform limits; CPU-bound work still belongs on EC2/ECS.
AWS services map
| Need | Service | Skip when |
|---|---|---|
| High connection fan-in | ALB (L7) or NLB (L4) | <1K connections on single instance |
| Enhanced networking | ENA-enabled instance types (c6gn, m6in) | Bursty Lambda with no persistent connections |
| Kernel bypass alternatives | Consider Nitro + DPDK only at 10Gbps+ | Standard web API under 1Gbps |
When this advice breaks
- TLS termination on instance — crypto dominates; offload to ALB/CloudFront first.
- Tiny payloads — syscall cost noise vs business logic.
What to do this week
- Compare
nginx/envoyworker count to CPU cores on origin ASG. - Profile with
perf topduring load test—look forsyscallhotspots. - Move static assets to S3+CloudFront; keep dynamic on epoll stack.
More in This Track
Part of the Engineering Guides library (June 2026).
- Previous: Part 2
- Next: Part 4
- Browse tracks: Engineering Guides hub
What this guide doesn’t cover
CPU cache false sharing—part 4 of this track.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.