Bloom Filters and HyperLogLog in Production on ElastiCache Redis
Quick summary: Bloom filters shave 90% of negative lookups; HyperLogLog estimates cardinality without storing every user ID. Redis modules on ElastiCache for abuse detection and feed deduplication.
Key Takeaways
- Bloom filters shave 90% of negative lookups; HyperLogLog estimates cardinality without storing every user ID
- ElastiCache Redis (June 2026) supports Bloom filters (RedisBloom module where enabled) and HyperLogLog ( / ) natively—cheap cardinality for “how many unique IPs today
- Benchmark pattern (hypothetical workload) — Bloom filter in ElastiCache Redis (1% FPR, 10M keys) uses 12MB vs 800MB for full set; HyperLogLog cardinality of daily unique visitors: 12KB memory, ±0
- 8% error vs exact SET at 240MB
- HyperLogLog ~12 KB per key for ~0
Table of Contents
ElastiCache Redis (June 2026) supports Bloom filters (RedisBloom module where enabled) and HyperLogLog (PFADD/PFCOUNT) natively—cheap cardinality for “how many unique IPs today?” without Prometheus label explosion.
Symptom → mechanism → AWS control
| Production symptom | Mechanism | AWS control |
|---|---|---|
| Origin hit on known-absent keys | No negative caching | Redis Bloom filter (RedisBloom module) on ElastiCache |
| Memory explosion on unique counts | Exact SET of user IDs | HyperLogLog PFADD/PFCOUNT on ElastiCache |
| False positive DB queries | Bloom filter FPR not tuned | Size filter for 1% FPR, accept rare origin hits |
Opinionated take: Bloom filters for ‘probably not in DB’ guards and HyperLogLog for analytics cardinality—never use either when exact counts are a compliance requirement.
Benchmark pattern (hypothetical workload) — Bloom filter in ElastiCache Redis (1% FPR, 10M keys) uses 12MB vs 800MB for full set; HyperLogLog cardinality of daily unique visitors: 12KB memory, ±0.8% error vs exact SET at 240MB.
Bloom filter
- Use: “Have we seen this dedup key?” with false positives OK
- Avoid: Need exact membership proof for billing
Configure expected items and false positive rate; size is fixed at creation.
HyperLogLog
~12 KB per key for ~0.81% error on billions of uniques—perfect for dashboard UV metrics, not invoicing.
AWS patterns
| Use case | Structure |
|---|---|
| API abuse | Bloom per IP window in Redis |
| Unique visitors | HLL per day, export to CloudWatch custom metric |
| Feed dedup | Bloom + DynamoDB for positives |
AWS services map
| Need | Service | Skip when |
|---|---|---|
| Probabilistic membership | ElastiCache Redis with RedisBloom | Exact membership required (billing) |
| Cardinality estimation | ElastiCache HyperLogLog | Need exact counts for invoicing |
| CDN negative cache | CloudFront custom error caching | Dynamic per-user responses |
What to do this week
- Replace
SETof all session IDs with HLL for daily active count dashboard. - Add Bloom pre-check before expensive Aurora query.
- Monitor Redis memory—Bloom is cheaper than full SET until false positive rate hurts UX.
More in This Track
Part of the Engineering Guides library (June 2026).
- Previous: Part 2
- Browse tracks: Engineering Guides hub
What this guide doesn’t cover
General cache invalidation—part 2 of caching track.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.