AWS ElastiCache Redis: Caching Strategies for Production
Quick summary: Redis is fast — until your application retries on every cache miss and the Redis bill starts looking like the database bill. ElastiCache patterns, data structures, cluster modes, eviction policies, and the production patterns that actually reduce database load.
Key Takeaways
- 1K Aurora scale-up avoided
- AWS ElastiCache for Redis provides managed Redis clusters that handle replication, failover, patching, and backup — the operational tasks that make self-managed Redis painful at scale
- 2025 Update: AWS now offers Amazon ElastiCache for Valkey alongside ElastiCache for Redis
- New workloads should consider Valkey — it's the open-source successor to Redis 7
- 2, maintained by the Linux Foundation, and is now the default engine for new ElastiCache clusters
Table of Contents
Caching is the most cost-effective way to improve application performance. A single Redis cache node can serve hundreds of thousands of reads per second with sub-millisecond latency — orders of magnitude faster than any database query. For applications bottlenecked by database read latency or struggling under read-heavy traffic patterns, Redis caching transforms performance without re-architecting the application.
Symptom → mechanism → AWS control
| Production symptom | Mechanism | AWS control |
|---|---|---|
| Cache stampede on expiry | Thundering herd to origin | ElastiCache TTL jitter, read-through with mutex |
| Hot key throttling | Single shard saturation | ElastiCache Serverless auto-scaling, local in-process L1 |
| Stale reads after write | Cache-aside without invalidation | Write-through or pub/sub invalidation on ElastiCache |
Opinionated take: ElastiCache Serverless for variable workloads, provisioned cluster mode when you can forecast RPS—always add TTL jitter on cache-aside.
Benchmark pattern (hypothetical workload) — ElastiCache Redis 7 cluster mode, cache-aside for product catalog, 94% hit rate, origin Aurora queries drop from 8K to 480/sec, p99 API latency 45ms→8ms, cluster cost $890/month vs $2.1K Aurora scale-up avoided.
June 2026 refresh: ElastiCache Serverless and managed Valkey offerings change provisioning math—confirm engine choice against HA/replica requirements rather than assuming shard-count defaults from older Redis OSS guides.
AWS ElastiCache for Redis provides managed Redis clusters that handle replication, failover, patching, and backup — the operational tasks that make self-managed Redis painful at scale. This guide covers the caching strategies and ElastiCache configurations that work in production.
2025 Update: AWS now offers Amazon ElastiCache for Valkey alongside ElastiCache for Redis. New workloads should consider Valkey — it’s the open-source successor to Redis 7.2, maintained by the Linux Foundation, and is now the default engine for new ElastiCache clusters. See our Valkey migration guide for details on migration paths and compatibility.
When to Use Caching
Caching Makes Sense When
- Read-heavy workloads — Your application reads far more than it writes (10:1 or higher read-to-write ratio)
- Expensive queries — Database queries involve joins, aggregations, or full-text search that take 50ms+
- Repeated access patterns — The same data is requested by multiple users (product pages, configuration, leaderboards)
- Latency requirements — Your API must respond in under 50ms, and database queries take longer
- Database bottleneck — Your RDS or DynamoDB read capacity is saturated and scaling the database is expensive
Caching Does Not Help When
- Write-heavy workloads — If every request writes unique data, caching adds complexity without benefit
- Unique queries — If every query is different (ad-hoc analytics, search with unique parameters), cache hit rates will be low
- Strong consistency requirements — If stale data is never acceptable, caching introduces consistency complexity
Caching Patterns
Pattern 1: Cache-Aside (Lazy Loading)
The most common pattern — the application checks the cache first and falls back to the database on cache miss:
Read request
→ Check Redis cache
→ Cache hit → Return cached data (sub-millisecond)
→ Cache miss → Query database → Store result in Redis → Return data
Advantages:
- Only caches data that is actually requested (no wasted memory)
- Cache failures do not break the application (falls back to database)
- Simple to implement
Disadvantages:
- First request for each item hits the database (cold cache)
- Stale data possible if database is updated without invalidating cache
- Cache stampede risk when many concurrent requests miss the cache simultaneously
Implementation considerations:
- Set a TTL (time-to-live) on every cached item to limit staleness
- Implement cache invalidation on write operations
- Use a mutex/lock for expensive queries to prevent cache stampede
Pattern 2: Write-Through
Write to both the cache and database simultaneously:
Write request
→ Write to Redis cache
→ Write to database
→ Return success
Advantages:
- Cache is always up to date with the database
- No stale data
- Read requests always hit the cache (after initial population)
Disadvantages:
- Every write has the overhead of two operations (cache + database)
- Data that is written but never read still consumes cache memory
- Cache contains data that may never be requested
Best for: Data that is frequently read after being written (user profiles, session data, configuration).
Pattern 3: Write-Behind (Write-Back)
Write to the cache immediately and asynchronously write to the database:
Write request
→ Write to Redis cache → Return success immediately
→ Background process → Write to database (async)
Advantages:
- Lowest write latency (only cache write is synchronous)
- Batches database writes for efficiency
- Absorbs write spikes without database overload
Disadvantages:
- Data loss risk if Redis fails before database write completes
- Complex consistency management
- Requires reliable background processing
Best for: High-throughput write workloads where slight data loss is acceptable (analytics counters, activity feeds, non-critical metrics).
Pattern 4: Read-Through with TTL Refresh
Automatically refresh cached data before TTL expires:
Background process
→ Scan for items approaching TTL expiry
→ Re-query database for fresh data
→ Update cache with fresh data
→ Users always see cached data (never hit database)
Best for: High-traffic items (homepage content, product catalogs) where cache misses cause noticeable latency and database load.
Redis Data Structures for Caching
Redis provides data structures beyond simple key-value storage. Choosing the right structure improves efficiency:
| Data Structure | Use Case | Example |
|---|---|---|
| String | Simple key-value cache | User profile, API response, session data |
| Hash | Object with multiple fields | User: {name, email, role, lastLogin} |
| List | Ordered collection, recent items | Activity feed, recent orders |
| Set | Unique collection, membership | Online users, unique visitors |
| Sorted Set | Ranked collection | Leaderboard, trending products |
| Stream | Event log, message queue | Activity stream, change notifications |
Practical Examples
Session storage (Hash):
HSET session:abc-123 userId "user-001" role "admin" tenant "acme" expiresAt "1720000000"
EXPIRE session:abc-123 3600
Leaderboard (Sorted Set):
ZADD leaderboard 1500 "player-001"
ZADD leaderboard 2300 "player-002"
ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10 players
Rate limiting (String with INCR):
INCR rate:user-001:2026-08-10T14:30
EXPIRE rate:user-001:2026-08-10T14:30 60 # 1-minute window
# Check: if count > 100, reject request
ElastiCache Configuration
Cluster Modes
Cluster Mode Disabled (single shard):
- One primary node + up to 5 read replicas
- All data on a single shard (limited by single node memory)
- Simpler to manage
- Max memory: 635.61 GB (r7g.16xlarge)
Cluster Mode Enabled (multiple shards):
- Data partitioned across up to 500 shards
- Each shard has a primary + up to 5 replicas
- Total memory = shards × node memory (theoretically unlimited)
- Supports online resharding (add/remove shards without downtime)
When to use Cluster Mode Enabled:
- Dataset exceeds single node memory
- Write throughput exceeds single primary capacity
- You need online scaling (adding shards without downtime)
When Cluster Mode Disabled is sufficient:
- Dataset fits in a single node
- Read scaling via replicas is sufficient
- Simpler operations preferred
Node Types
| Category | Example | Use Case |
|---|---|---|
| General Purpose (m7g) | cache.m7g.large | Balanced workloads, most production use cases |
| Memory Optimized (r7g) | cache.r7g.xlarge | Large datasets, high memory-to-CPU ratio |
| Small/Dev (t4g) | cache.t4g.micro | Development, testing, low-traffic production |
Graviton (g suffix) instances provide 20-30% better price-performance than equivalent Intel instances. Always use Graviton for new deployments.
High Availability
- Multi-AZ with automatic failover — Always enable for production. If the primary node fails, ElastiCache automatically promotes a replica to primary (failover time: typically 10-30 seconds).
- Read replicas — Scale read capacity horizontally. Your application reads from replicas and writes to the primary.
- Global Datastore — Cross-Region replication for disaster recovery and low-latency global reads.
Cache Invalidation
Cache invalidation is the hardest problem in caching. Stale data causes bugs; aggressive invalidation reduces cache hit rates.
TTL-Based Expiry
Set a TTL on every cached item:
| Data Type | Recommended TTL | Rationale |
|---|---|---|
| Configuration | 5-15 minutes | Changes infrequently, slight staleness acceptable |
| User profile | 1-5 minutes | Changes occasionally, brief staleness tolerable |
| Product catalog | 15-60 minutes | Changes via admin updates, not user-facing mutations |
| API response | 30-300 seconds | Depends on data freshness requirements |
| Session data | 30-60 minutes | Match session timeout policy |
Event-Based Invalidation
Invalidate cache entries when the underlying data changes:
Database write (DynamoDB Stream / RDS event)
→ Lambda function
→ Delete or update Redis cache entry
For DynamoDB, use DynamoDB Streams to trigger Lambda functions that invalidate corresponding cache entries. For RDS, use event notifications or application-level invalidation.
Tag-Based Invalidation
Group related cache entries with tags for bulk invalidation:
Cache entry: product:123 → tags: ["catalog", "category:electronics"]
Cache entry: product:456 → tags: ["catalog", "category:electronics"]
Invalidate: all entries tagged "category:electronics"
→ Deletes product:123 and product:456 simultaneously
Implement with Redis Sets: maintain a set per tag containing all keys associated with that tag.
ElastiCache Serverless
ElastiCache Serverless removes capacity planning entirely:
- Automatically scales memory and compute based on usage
- No node selection, no cluster management
- Pay for data stored (per GB-hour) and compute (per ECPU)
- Minimum charge applies ($0.125/hour ≈ $90/month)
When to use Serverless:
- Unpredictable or spiky traffic patterns
- New applications where cache sizing is unknown
- Teams that want to avoid capacity planning
When to use provisioned nodes:
- Predictable workloads where node sizing is known
- Cost optimization with Reserved Nodes (up to 55% savings)
- Requirements for specific node types or cluster configurations
Monitoring
Key CloudWatch Metrics
| Metric | Target | Action If Outside Target |
|---|---|---|
| CacheHitRate | > 80% | Low hit rate = wrong caching strategy or TTL |
| EngineCPUUtilization | < 70% | Scale up or add shards |
| DatabaseMemoryUsagePercentage | < 80% | Scale up or review eviction policy |
| CurrConnections | Below max | Connection pooling issue if near limit |
| ReplicationLag | < 1 second | Network or replica capacity issue |
| Evictions | Near zero | Memory pressure if evictions increase |
Set CloudWatch alarms for:
EngineCPUUtilization > 70%— Scale before performance degradesDatabaseMemoryUsagePercentage > 80%— Scale before evictions beginCacheHitRate < 50%— Investigate caching strategy
Cost Optimization
Right-Sizing
Monitor DatabaseMemoryUsagePercentage over 2 weeks. If consistently below 50%, you are paying for unused memory. Downsize to a smaller node type.
Reserved Nodes
For steady-state production caches, Reserved Nodes provide significant savings:
| Payment Option | 1-Year Savings | 3-Year Savings |
|---|---|---|
| No upfront | ~28% | ~41% |
| Partial upfront | ~35% | ~50% |
| All upfront | ~38% | ~55% |
Data Tiering
ElastiCache data tiering automatically moves less-frequently accessed data to SSD storage, reducing memory costs for large datasets:
- Hot data stays in memory (sub-millisecond latency)
- Warm data moves to SSD (single-digit millisecond latency)
- Available on r6gd, r7gd, and r8gd node types
Common Mistakes
Mistake 1: Caching Without TTL
Cached data without a TTL lives forever — becoming stale as the source database changes. Always set a TTL. If you are unsure, start with 5 minutes and adjust based on your data’s change frequency and tolerance for staleness.
Mistake 2: No Connection Pooling
Creating a new Redis connection for every request is expensive. Use connection pooling in your application. For Lambda, initialize the Redis connection outside the handler function to reuse connections across invocations.
Mistake 3: Using Redis as Primary Storage
Redis is a cache, not a database. If your application cannot function when Redis is empty (cold start, failover, eviction), you have a cache dependency, not a caching strategy. Every cached item must be retrievable from the primary data store.
Mistake 4: Caching Too Much
Not all data benefits from caching. Data accessed once (unique search results, one-time API calls) wastes cache memory. Focus caching on frequently accessed, expensive-to-compute, or slowly changing data.
Getting Started
ElastiCache Redis fills the performance gap between your application and your database. For read-heavy serverless applications, high-traffic APIs, and latency-sensitive workloads, a well-implemented caching layer provides the single largest performance improvement available.
For caching architecture design, ElastiCache configuration, and performance optimization as part of our architecture review or managed services, talk to our team.
Contact us to optimize your application performance →
More in This Track
Part of the Engineering Guides library (June 2026).
- Next: Part 2
- Browse tracks: Engineering Guides hub
Related reading
- AWS Cloud Center of Excellence (CCoE): Operating Model, RFCs, and How WAR + FinOps Connect
- AWS for Retail: The Complete Guide for eCommerce Teams
- AWS Global Accelerator vs CloudFront & Route 53 (2026)
- AWS IoT Greengrass v2: Edge Computing for Factory Floors
- AWS IoT SiteWise Native Anomaly Detection for Predictive Maintenance
- AWS IoT Solutions: Architecture Patterns for Connected Devices
- AWS IoT TwinMaker: Digital Twin Architecture for Manufacturing
- AWS Managed Services vs AWS Support Plans: What
- AWS Architecture for Black Friday: How Retail Teams Prepare for Peak Traffic
- 12 Benefits of Hiring a Certified AWS Consultant — With Real ROI
- Custom AWS Development for Retail: When Off-the-Shelf Is Not Enough
- How to Optimize EC2 for High-Performance APIs
- Microservices Design Patterns on AWS: 10 Patterns That Actually Matter in 2026
- How to Choose Between Nginx, FrankenPHP, and Modern Web Runtimes (2026)
- OPC-UA on AWS: SiteWise Edge Gateway Setup and Best Practices
- OT/IT Convergence on AWS: Architecture Patterns for Smart Manufacturing
- How to Build Reliable Queue Systems on AWS (SQS, Kafka, Redis)
- How to Tune PHP, Node.js, Python, and Go for High Concurrency on AWS
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.