False Sharing and CPU Cache on AWS

CPU Cache Coherence and False Sharing for Cloud Backend Engineers

Quick summary: Two goroutines updating adjacent counters can saturate memory bus on a c7g.8xlarge. Memory barriers, cache lines, and false sharing—why placement groups do not fix application-level contention.

Key Takeaways

Two goroutines updating adjacent counters can saturate memory bus on a c7g
8xlarge
Graviton3 (June 2026) offers strong price/performance for Java and Go services—but false sharing on hot counters still collapses scalability long before network limits
Benchmark pattern (hypothetical workload) — Java 21 virtual-thread counter array with false sharing on adjacent AtomicLongs, throughput drops 8x (1
2M→150K ops/sec); padding to 64-byte cache lines restores 1

Graviton3 (June 2026) offers strong price/performance for Java and Go services—but false sharing on hot counters still collapses scalability long before network limits.

Benchmark pattern (hypothetical workload) — Java 21 virtual-thread counter array with false sharing on adjacent AtomicLongs, throughput drops 8x (1.2M→150K ops/sec); padding to 64-byte cache lines restores 1.1M ops/sec on c7g.4xlarge Graviton3.

Symptom → mechanism → AWS control

Production symptom	Mechanism	AWS control
Scaling cores doesn’t scale throughput	False sharing invalidates cache lines	@Contended (Java), pad hot counters to 64 bytes
Noisy neighbor CPU spikes	Cache coherence traffic on shared memory	Pin workloads to dedicated instances, Graviton for price/perf
Latency jitter on lock-free code	MESI protocol coherence misses	Per-thread local accumulators, merge on flush

Opinionated take: When horizontal scaling stops helping, check false sharing before buying bigger instances—it’s the silent killer on Graviton and x86 alike.

Mechanism

CPUs cache data in 64-byte lines. Two threads mutating different variables in the same line cause cache line bouncing—memory barriers flush caches between cores.

Distributed systems add network coherence (DynamoDB conditional writes)—do not confuse with CPU MESI protocol.

AWS services map

Need	Service	Skip when
CPU profiling	CloudWatch Agent + perf or JFR on ECS/EKS	Fully managed Lambda with no profiling access
Graviton price-performance	c7g/m7g instances	x86-only dependencies without ARM builds
Dedicated tenancy	EC2 dedicated hosts	Shared tenancy with low CPU sensitivity

Scenario	Mitigation
Per-request metrics arrays	Pad structs to cache line; use per-core aggregators
Lock-free queues on EC2	Align atomic slots; benchmark on same instance class as prod
NUMA on large instances	Pin threads; use `c7g` size matched to actual parallelism

Placement groups reduce network latency—they do not fix false sharing in code.

When this advice breaks

I/O-bound Lambda — CPU cache irrelevant; optimize cold start and downstream calls.
Managed services — You do not tune RDS CPU cache; tune queries.

What to do this week

Run perf c2c or VTune on hottest lock-free path under load.
Separate frequently updated atomics by 64 bytes in hot structs.
Load test on production instance family—not laptop.

What this guide doesn’t cover

JVM GC and object layout—see concurrency runtime track.

CPU Cache Coherence and False Sharing for Cloud Backend Engineers

Symptom → mechanism → AWS control

Mechanism

AWS services map

When this advice breaks

What to do this week

More in This Track

What this guide doesn’t cover

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Recommended Reading

High-Concurrency Server I/O: epoll, Syscalls, and Zero-Copy on AWS EC2

Modern Web Transport on AWS: TCP Congestion, HTTP/2, HTTP/3, and QUIC

TLS 1.3 Handshake Internals on AWS: ALB, CloudFront, and ACM

Bloom Filters and HyperLogLog in Production on ElastiCache Redis

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Symptom → mechanism → AWS control

Mechanism

AWS services map

When this advice breaks

What to do this week

More in This Track

What this guide doesn’t cover

Related AWS Services

AWS Architecture Review

AWS Serverless

AWS Migration

Recommended Reading

High-Concurrency Server I/O: epoll, Syscalls, and Zero-Copy on AWS EC2

Modern Web Transport on AWS: TCP Congestion, HTTP/2, HTTP/3, and QUIC

TLS 1.3 Handshake Internals on AWS: ALB, CloudFront, and ACM

Bloom Filters and HyperLogLog in Production on ElastiCache Redis