Skip to main content

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Bloom filters shave 90% of negative lookups; HyperLogLog estimates cardinality without storing every user ID. Redis modules on ElastiCache for abuse detection and feed deduplication.

Key Facts

  • Bloom filters shave 90% of negative lookups; HyperLogLog estimates cardinality without storing every user ID
  • ElastiCache Redis (June 2026) supports Bloom filters (RedisBloom module where enabled) and HyperLogLog ( / ) natively—cheap cardinality for “how many unique IPs today
  • Benchmark pattern (hypothetical workload) — Bloom filter in ElastiCache Redis (1% FPR, 10M keys) uses 12MB vs 800MB for full set; HyperLogLog cardinality of daily unique visitors: 12KB memory, ±0
  • 8% error vs exact SET at 240MB
  • HyperLogLog ~12 KB per key for ~0

Entity Definitions

Aurora
Aurora is an AWS service discussed in this article.
DynamoDB
DynamoDB is an AWS service discussed in this article.
CloudFront
CloudFront is an AWS service discussed in this article.
CloudWatch
CloudWatch is an AWS service discussed in this article.
ElastiCache
ElastiCache is an AWS service discussed in this article.
compliance
compliance is a cloud computing concept discussed in this article.

Bloom Filters and HyperLogLog in Production on ElastiCache Redis

Quick summary: Bloom filters shave 90% of negative lookups; HyperLogLog estimates cardinality without storing every user ID. Redis modules on ElastiCache for abuse detection and feed deduplication.

Key Takeaways

  • Bloom filters shave 90% of negative lookups; HyperLogLog estimates cardinality without storing every user ID
  • ElastiCache Redis (June 2026) supports Bloom filters (RedisBloom module where enabled) and HyperLogLog ( / ) natively—cheap cardinality for “how many unique IPs today
  • Benchmark pattern (hypothetical workload) — Bloom filter in ElastiCache Redis (1% FPR, 10M keys) uses 12MB vs 800MB for full set; HyperLogLog cardinality of daily unique visitors: 12KB memory, ±0
  • 8% error vs exact SET at 240MB
  • HyperLogLog ~12 KB per key for ~0
Bloom Filters and HyperLogLog in Production on ElastiCache Redis
Table of Contents

ElastiCache Redis (June 2026) supports Bloom filters (RedisBloom module where enabled) and HyperLogLog (PFADD/PFCOUNT) natively—cheap cardinality for “how many unique IPs today?” without Prometheus label explosion.

Symptom → mechanism → AWS control

Production symptomMechanismAWS control
Origin hit on known-absent keysNo negative cachingRedis Bloom filter (RedisBloom module) on ElastiCache
Memory explosion on unique countsExact SET of user IDsHyperLogLog PFADD/PFCOUNT on ElastiCache
False positive DB queriesBloom filter FPR not tunedSize filter for 1% FPR, accept rare origin hits

Opinionated take: Bloom filters for ‘probably not in DB’ guards and HyperLogLog for analytics cardinality—never use either when exact counts are a compliance requirement.

Benchmark pattern (hypothetical workload) — Bloom filter in ElastiCache Redis (1% FPR, 10M keys) uses 12MB vs 800MB for full set; HyperLogLog cardinality of daily unique visitors: 12KB memory, ±0.8% error vs exact SET at 240MB.

Bloom filter

  • Use: “Have we seen this dedup key?” with false positives OK
  • Avoid: Need exact membership proof for billing

Configure expected items and false positive rate; size is fixed at creation.

HyperLogLog

~12 KB per key for ~0.81% error on billions of uniques—perfect for dashboard UV metrics, not invoicing.

AWS patterns

Use caseStructure
API abuseBloom per IP window in Redis
Unique visitorsHLL per day, export to CloudWatch custom metric
Feed dedupBloom + DynamoDB for positives

AWS services map

NeedServiceSkip when
Probabilistic membershipElastiCache Redis with RedisBloomExact membership required (billing)
Cardinality estimationElastiCache HyperLogLogNeed exact counts for invoicing
CDN negative cacheCloudFront custom error cachingDynamic per-user responses

What to do this week

  1. Replace SET of all session IDs with HLL for daily active count dashboard.
  2. Add Bloom pre-check before expensive Aurora query.
  3. Monitor Redis memory—Bloom is cheaper than full SET until false positive rate hurts UX.

More in This Track

Part of the Engineering Guides library (June 2026).

What this guide doesn’t cover

General cache invalidation—part 2 of caching track.

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Recommended Reading

Explore All Articles »