Should I buy AWS Savings Plans or Reserved Instances first?

Neither — first optimize architecture to reduce volume consumed, then right-size to eliminate over-provisioning, then measure your stable baseline compute over 30+ days, then purchase commitments to cover 70–80% of that baseline. Purchasing commitments before architectural optimization locks you into spending on capacity that may no longer match your architecture after cost reduction work. Compute Savings Plans are the right starting point for most organizations because they apply across instance families, regions, and sizes, providing flexibility if your infrastructure evolves during the commitment period.

What is rightsizing in AWS and how do I do it correctly?

Rightsizing is selecting instances appropriately sized for your actual workload — not just using smaller instances. The process: collect CloudWatch CPU, memory (requires CloudWatch Agent), network, and disk IO metrics over 14+ days covering peak and off-peak. Identify utilization at the 95th percentile (not average, not maximum). Target 60–70% utilization at 95th percentile to provide spike headroom without wasting capacity at average load. Use AWS Compute Optimizer for automated recommendations based on your actual metric data. Match the instance family to the actual bottleneck — a memory-constrained workload on a compute-optimized instance is simultaneously over-provisioned on CPU and under-provisioned on memory.

When is batch processing cheaper than real-time processing on AWS?

Batch processing is cheaper when: maximum acceptable latency is minutes or hours (not seconds); objects can be compacted before processing; downstream writes can be batched (bulk DynamoDB writes cost less than single-item writes); and Lambda initialization overhead is a significant fraction of total execution time. A real-time Lambda triggered on every S3 upload pays per-invocation overhead — start, end, initialization, CloudWatch log lines — for each individual object. A batch Lambda processing 1,000 objects together pays that overhead once. For data analytics, ETL, and non-urgent notifications, batch processing typically reduces costs by 60–90% versus equivalent real-time architectures.

How do VPC Endpoints reduce AWS costs?

VPC Endpoints route AWS service API calls (S3, DynamoDB, Secrets Manager, ECR, CloudWatch Logs, SQS, etc.) directly through the AWS private network, bypassing NAT Gateway. Gateway Endpoints for S3 and DynamoDB are completely free. Interface Endpoints for other services charge an hourly fee and a per-GB data processing fee lower than NAT Gateway rates. Any private-subnet resource calling AWS services through NAT Gateway without VPC Endpoints is paying NAT data processing charges on every API call. For clusters with high API call volumes to DynamoDB, S3, or ECR, the savings from Gateway Endpoints alone frequently justify the one-time configuration effort.

AWS Cost Control: Architecture Beats Savings Plans

Part 8 of 8: The AWS Cost Trap — Why Your Bill Keeps Surprising You

Series tooling (June 2026): Cost Explorer + Analyze with Amazon Q for human drill-down; Cost Optimization Hub for prioritized backlog; FinOps Agent (preview) for Slack/Jira delivery.---

When AWS bills exceed expectations, the most common first response is to look for discounts: purchase Reserved Instances, sign up for Savings Plans, negotiate an Enterprise Discount Program. These are valid tools. A one-year Compute Savings Plan can reduce EC2 costs by 30–40% compared to on-demand pricing. Reserved Instances for predictable workloads reduce costs further.

But discounts reduce the rate you pay. They do not reduce the volume of resources you consume. A system that generates $100,000 per month in on-demand compute costs will cost $60,000–$70,000 per month with a Savings Plan — still expensive, and still growing linearly with scale. The architectural patterns that generated $100,000 in compute costs are unchanged. As you grow, you will return to $100,000 and beyond, but now with a committed spend that cannot be reduced without forfeiting commitment fees.

Durable cost reduction requires changing the architecture: reducing the volume of resources consumed, not just the price per unit. The playbook for this is a set of structural patterns — each one targeting a specific cost driver identified in the previous posts in this series.

Reduce Cross-AZ Chatter

Inter-Availability Zone data transfer is charged per GB in both directions. In microservices architectures, east-west traffic between services in different AZs generates continuous transfer charges. The fix is architectural: locality-aware routing.

AZ affinity in ECS. When ECS tasks call other ECS services, configure the AWS Cloud Map service discovery or the Application Load Balancer to prefer targets in the same AZ as the calling task. ECS Service Connect supports traffic routing with AZ awareness. The goal is that a task in us-east-1a calls other tasks in us-east-1a rather than tasks in us-east-1b or us-east-1c.

Topology-aware routing in Kubernetes. EKS (Kubernetes 1.21+) supports topology-aware routing via the service.kubernetes.io/topology-mode: Auto annotation on Services. When enabled, kube-proxy prefers endpoints in the same zone as the calling pod. This reduces cross-AZ service-mesh traffic without changing application code.

Placement groups for tightly coupled EC2. A cluster placement group ensures that EC2 instances are placed on hardware within the same availability zone and as close together as possible. For workloads with high-bandwidth, low-latency requirements between a fixed set of instances — HPC, distributed databases, large in-memory caches — placement groups both reduce cross-AZ transfer charges and improve performance.

The tradeoff: AZ affinity reduces resilience. A system that strictly routes traffic within AZ will experience degraded capacity if that AZ has an outage. The correct design is a soft preference — prefer same-AZ, but fall back to cross-AZ — rather than strict locality that creates a single-AZ dependency. All the mechanisms above support soft preferences; use them that way.

Eliminate Unnecessary NAT Gateway Traffic

As discussed in Part 2, NAT Gateway charges per GB processed. Any private subnet resource calling an AWS service through NAT Gateway is generating avoidable costs. VPC Endpoints route that traffic directly through the AWS network without NAT processing charges.

Gateway Endpoints (free):

Amazon S3
Amazon DynamoDB

Interface Endpoints (hourly + per-GB charge, but lower than NAT):

AWS Secrets Manager
AWS Systems Manager (SSM)
Amazon ECR (container image pulls)
Amazon CloudWatch Logs
AWS STS
Amazon SQS
Amazon SNS

The decision rule: for any AWS service your private-subnet resources call more than a few times per day, evaluate whether a VPC Endpoint reduces costs compared to NAT Gateway processing charges. For S3 and DynamoDB specifically, Gateway Endpoints are always the right choice — they are free, they improve performance, and they eliminate NAT Gateway processing on what are typically the highest-volume inter-service calls.

Audit your NAT Gateway traffic by enabling CloudWatch metrics on your NAT Gateway (BytesOutToDestination grouped by destination) or by sampling VPC Flow Logs for a 24-hour period. The top destination addresses from private subnet sources will tell you which services are generating the most NAT traffic. Create VPC Endpoints for the AWS services in that top list.

VPC Lattice: Simplifying Service-to-Service Networking and Reducing Transit Costs

For microservices architectures with complex service-to-service communication, VPC Lattice (generally available since 2023) is worth evaluating as a replacement for multi-VPC Transit Gateway setups. Traditional approaches to cross-VPC service networking — Transit Gateway, VPC peering, or PrivateLink per service — each have fixed or per-GB costs that accumulate with service mesh complexity.

VPC Lattice provides a managed application networking layer for service-to-service communication across VPCs and accounts. It handles routing, authentication, and observability at the application layer without requiring Transit Gateway attachments. For architectures where services in separate VPCs or accounts communicate frequently, VPC Lattice’s pricing (per hour per service network + per GB processed) can be lower than Transit Gateway attachment costs plus data processing charges for the same traffic volume.

The key comparison: Transit Gateway charges per attachment per hour plus per GB data processed. If you have 20 services across 4 VPCs, you have 4 Transit Gateway VPC attachments ($0.05/hour each = $144/month in attachment fees alone) plus data processing. VPC Lattice charges per service network hour plus per GB — for service meshes with fewer than ~8 VPCs, the cost model is typically equivalent or lower with reduced operational complexity.

Caching: The Structural Cost Reducer

Every cache hit is a request that did not reach the origin. Every request that does not reach the origin does not generate:

Origin compute cost (Lambda invocation, EC2 CPU)
Database read cost (DynamoDB RCU, RDS query)
S3 GET request cost
Network transfer from origin to cache to caller

Caching reduces costs at every layer simultaneously, not just at the cached layer. This multiplicative cost reduction is why caching is the highest-return optimization in most architectures — not because each individual hit is valuable, but because a 90% cache hit rate eliminates 90% of origin resource consumption at all levels of the stack.

ElastiCache for database offload. The most common and highest-ROI caching pattern: place ElastiCache (Redis or Memcached) in front of your primary database. Cache the results of expensive queries, the records of frequently accessed entities, and the session data for authenticated users. A database query that runs 10,000 times per hour and takes 10 ms each time can be reduced to 1,000 database queries and 9,000 ElastiCache cache hits — a 90% reduction in database load at a fraction of the database cost.

ElastiCache itself has a cost: instance hours plus data transfer. For workloads where the database is a cost driver, ElastiCache almost always reduces total cost because ElastiCache instance costs are substantially lower than the equivalent RDS capacity required to handle peak load without caching.

Lambda response caching with API Gateway. API Gateway supports response caching at the gateway level, with configurable TTLs per resource and method. For API endpoints that return data that changes infrequently (catalog data, configuration, reference data), gateway-level caching eliminates Lambda invocations for cached responses. The API Gateway cache has a per-hour cost based on cache size, but at moderate-to-high request rates, the Lambda cost savings exceed the cache cost within days.

CloudFront caching for everything edge-deliverable. CloudFront should cache not just static assets but any API response that can be shared across users with the same request parameters. Product listings, category pages, search results for common queries, and pricing data are candidates for edge caching. Each cache hit from CloudFront edge does not reach your origin, does not invoke Lambda, does not consume DynamoDB reads, and does not traverse NAT Gateway.

The cache-control headers on your origin responses determine whether CloudFront caches a response. An origin that returns Cache-Control: no-cache on all responses provides no benefit from being behind CloudFront (beyond DDoS protection and geographic distribution). Auditing your cache-control headers and maximizing cacheable TTLs for non-personalized content is one of the highest-leverage configuration changes for cost reduction.

Batch vs. Real-Time: The Architecture Decision That Drives Costs

Many workloads process data in real-time that does not actually require real-time processing. The default architecture for data processing has shifted to streaming (Kinesis, Kafka, Lambda) because it is feasible — but feasibility does not imply cost efficiency.

A real-time Lambda function triggered on every S3 upload, running for 5 seconds per invocation, costs differently than a batch Lambda function that processes 1,000 uploads together every 5 minutes. Both architectures process the same data. The real-time architecture generates 1,000 invocations per processing window. The batch architecture generates 1 invocation. At the same compute cost per second, the batch architecture is not necessarily cheaper (it processes the same total data volume), but it:

Reduces per-invocation overhead (start, end, initialization) by 1,000×
Reduces CloudWatch log lines by 1,000×
Reduces SQS or EventBridge event costs if those services trigger the Lambda
Enables larger-batch processing optimizations (file compaction, bulk database writes)

The batch vs. real-time decision framework:

What is the maximum acceptable latency between data arrival and processing completion?
If the answer is “seconds,” real-time processing is justified.
If the answer is “minutes” or “hours,” batch processing is almost always cheaper and often simpler.

For data analytics pipelines, ETL workloads, report generation, and notification systems with non-urgent delivery requirements, batch processing reduces cost substantially without degrading user experience.

Rightsizing: What It Actually Means

Rightsizing is not “use smaller instances.” It is “use instances that are appropriately sized for your actual workload characteristics.”

An oversized instance wastes money. An undersized instance creates performance problems that engineers respond to by scaling out (more instances) rather than scaling up (correctly sized instances), which often costs more than a single correctly sized instance would.

The rightsizing process:

Collect CloudWatch utilization metrics for CPU, memory (requires CloudWatch Agent for EC2), network, and disk IO over a minimum 14-day period covering peak and off-peak patterns.
Identify the utilization at the 95th percentile — not the average, not the maximum. The 95th percentile captures your sustained peak without being distorted by one-time spikes.
Target 60–70% utilization at the 95th percentile. This gives headroom for unexpected spikes without wasting capacity at average load.
Select the instance type that achieves 60–70% utilization at 95th percentile load.

AWS Compute Optimizer performs this analysis automatically and provides recommendations with projected cost impact. It uses actual CloudWatch metric data from your running instances, not generic benchmarks. The recommendations are not always correct — they cannot account for application-specific behavior — but they are a useful starting point that surfaces clear over-provisioning.

Memory-optimized vs. compute-optimized vs. general-purpose: The instance family matters as much as the size. A workload that is memory-constrained running on a compute-optimized instance is over-provisioned on CPU and under-provisioned on memory simultaneously. Matching the instance family to the workload bottleneck (memory, CPU, network, storage) is the first step in rightsizing, not the final one.

Savings Plans and Reserved Instances: When to Use Them

Savings Plans and Reserved Instances should be purchased after architectural optimization, not before. Purchasing a commitment for a system that will be significantly changed by cost optimization work locks you into spending on capacity that no longer matches your architecture.

The sequence:

Architect to reduce volume (cross-AZ reduction, caching, batch/real-time trade-offs)
Right-size to reduce over-provisioning
Measure stable baseline compute after steps 1 and 2
Purchase Savings Plans or Reserved Instances to cover that stable baseline at a discount

Compute Savings Plans are the most flexible commitment: they apply to any EC2 instance family, size, region, and OS, as well as Lambda and Fargate. They are the right starting point for most organizations because they provide flexibility if instance types change.

EC2 Instance Savings Plans commit to a specific instance family in a specific region and provide a deeper discount than Compute Savings Plans. Use these when your instance type and region are stable and unlikely to change within the commitment period.

The coverage target: aim for Savings Plans or Reserved Instances to cover 70–80% of your stable baseline compute. Leave 20–30% on on-demand pricing to absorb spikes and workload changes without forfeiting committed spend. An account with 100% committed spend has no flexibility for growth or architectural changes.

AWS Trusted Advisor as a Starting Point

AWS Trusted Advisor (Business and Enterprise Support tiers) provides automated checks across cost, performance, security, and fault tolerance. The cost checks that provide the most value:

Idle EC2 instances: instances with less than 10% average daily CPU and minimal network activity over 14 days
Underutilized EBS volumes: volumes with less than 1 IOPS average over 7 days
Idle RDS DB instances: RDS instances with no connections over the past 7 days
Savings Plans and Reserved Instance coverage: what fraction of your usage is covered by commitments

Trusted Advisor is not a comprehensive FinOps solution — it surfaces obvious inefficiencies, not architectural patterns. But it provides a regular automated scan that catches the clearest zombie resources and over-provisioning without requiring manual audit work.

For accounts without Business Support+ (the post-2025-12-02 replacement for Business Support), AWS Cost Explorer Rightsizing Recommendations (available to all accounts) and AWS Compute Optimizer (free) provide similar functionality for EC2 and ECS without the Trusted Advisor subscription requirement.

Rightsizing and recommendation routing (June 2026)

Three AWS surfaces overlap — use each for what it does best:

Surface	Scope	When to use
Cost Explorer rightsizing	EC2 instances	Free first pass on underutilized compute
Compute Optimizer	EC2, ASG, RDS, EBS, Lambda, ECS	Deeper metrics; 32-day lookback (June 2026, free) for EBS/ECS with month-end or weekly spikes
Cost Optimization Hub	All of the above + Trusted Advisor idle checks	Single prioritized backlog across the org

Delivery to engineering: FinOps Agent (preview) rolls COH recommendations into Jira tickets and auto-investigates anomaly events. Analyze with Amazon Q explains whatever Cost Explorer view a human has open during weekly reviews. Neither replaces the architectural patterns in this playbook — they operationalize the backlog those patterns create.

The Principle: Design for Cost From Day One

The themes across all eight posts in this series converge on a single principle: cost is an emergent property of your architecture, not a billing artifact you optimize after the fact.

Every architectural decision has a cost dimension:

Synchronous vs. asynchronous communication → latency vs. cost trade-off
Microservices vs. modular monolith → operational flexibility vs. inter-service data transfer cost
Multi-AZ distribution → resilience vs. cross-AZ transfer cost
Real-time vs. batch processing → latency vs. invocation overhead
High-cardinality metrics vs. structured logs → observability granularity vs. CloudWatch cost

None of these trade-offs has a universally correct answer. The right answer depends on your workload, your scale, your latency requirements, and your cost targets. What matters is that the trade-off is explicit — made with awareness of the cost dimension — rather than implicit, where the cost dimension is discovered only when the bill arrives.

The organizations that manage AWS costs effectively are not the ones with the best Savings Plan coverage. They are the ones where cost awareness is embedded in the engineering culture: in architecture reviews, in PR checklists, in sprint retrospectives, and in the operational dashboards that engineers look at every day.

Cost control is not a FinOps function. It is an engineering function, informed by FinOps data. The distinction matters because the people who can change costs are engineers. Finance can report on costs. Engineers can design them down.---

Related reading: 5 AWS Cost Optimization Strategies Most Teams Overlook is a quick tactical companion to this post — right-sizing, lifecycle policies, and anomaly detection in a faster format. AWS ElastiCache: Redis Caching Strategies for Production covers ElastiCache architecture and cache invalidation strategy in operational depth. For Savings Plans and Reserved Instance monitoring workflow, see AWS Cost Explorer and Budgets: A Cloud Cost Management Guide.---

The AWS Cost Trap — Full Series

Part 1 — Billing Complexity as a System Problem · Part 2 — Data Transfer Costs · Part 3 — Autoscaling + AI Workloads · Part 4 — Observability & Logging Costs · Part 5 — S3 Storage Cost Traps · Part 6 — The FinOps Gap · Part 7 — Real Failure Patterns · Part 8 — Optimization Playbook---

This concludes The AWS Cost Trap series. We covered billing complexity as a system property (Part 1), data transfer patterns that break budgets (Part 2), autoscaling feedback loops (Part 3), observability cost anti-patterns (Part 4), S3 usage traps (Part 5), the FinOps organizational gap (Part 6), real failure patterns (Part 7), and the architectural playbook for durable cost reduction (Part 8).

If you are working through these patterns in your own AWS environment and want a structured review, contact the FactualMinds team. As an AWS Select Tier Consulting Partner specializing in cloud cost optimization and architecture, we run cost audits that identify the specific patterns from this series in your account — with prioritized recommendations ranked by cost impact.

Cost Control Is Architecture, Not Discounts

Reduce Cross-AZ Chatter

Eliminate Unnecessary NAT Gateway Traffic

Caching: The Structural Cost Reducer

Batch vs. Real-Time: The Architecture Decision That Drives Costs

Rightsizing: What It Actually Means

Savings Plans and Reserved Instances: When to Use Them

AWS Trusted Advisor as a Starting Point

Rightsizing and recommendation routing (June 2026)

The Principle: Design for Cost From Day One

Recommended Reading

How Startups Accidentally Burn $100k/month

Autoscaling Broke Your Budget (AI Made It Worse)

Logging Yourself Into Bankruptcy

Data Transfer: The Line Item That Breaks Startups

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Reduce Cross-AZ Chatter

Eliminate Unnecessary NAT Gateway Traffic

Caching: The Structural Cost Reducer

Batch vs. Real-Time: The Architecture Decision That Drives Costs

Rightsizing: What It Actually Means

Savings Plans and Reserved Instances: When to Use Them

AWS Trusted Advisor as a Starting Point

Rightsizing and recommendation routing (June 2026)

The Principle: Design for Cost From Day One

Related reading

Recommended Reading

How Startups Accidentally Burn $100k/month

Autoscaling Broke Your Budget (AI Made It Worse)

Logging Yourself Into Bankruptcy

Data Transfer: The Line Item That Breaks Startups