AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned
Quick summary: A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations.
Key Takeaways
- A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
- A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations

Table of Contents
Lambda’s pay-per-request pricing is one of its biggest selling points — but “pay only for what you use” does not automatically mean “pay the least possible.” Without optimization, Lambda costs can grow faster than expected, especially as workloads scale.
This guide covers the practical cost optimization strategies we implement for clients running serverless workloads on AWS.
Understanding Lambda Pricing
Lambda charges for two things:
- Requests — $0.20 per million invocations
- Duration — $0.0000166667 per GB-second (charged per millisecond)
Duration cost depends on two factors you control: memory allocation (which also determines CPU) and execution time.
Example: A function with 512 MB memory running for 200ms:
- Duration cost: 0.5 GB × 0.2 seconds × $0.0000166667 = $0.00000167
- Request cost: $0.0000002
- Total per invocation: ~$0.0000019
- At 10 million invocations/month: ~$19
The free tier provides 1 million requests and 400,000 GB-seconds per month — enough for many development and low-traffic production workloads.
Important: INIT Phase Billing Change (August 2025)
Effective August 1, 2025, AWS changed Lambda pricing to bill for the INIT phase (cold start initialization time). Previously, the INIT phase was free. Now, initialization time counts toward your billed duration at the same rate as execution time.
Impact: Functions with long initialization times (loading large dependencies, establishing database connections, loading ML models) now have higher effective costs per cold start. This change makes cold start optimization more financially important than before.
- Mitigations: Lambda SnapStart, connection pooling via RDS Proxy, lazy loading, and reducing package size
- Who is most affected: Java and .NET functions with large framework startup times, functions loading ML models, functions initializing large SDKs
For new functions, target INIT times under 200ms. For existing functions with INIT times over 1 second, evaluate Lambda SnapStart or architecture redesign.
Memory Tuning: The Biggest Lever
Lambda CPU scales linearly with memory. At 1,769 MB, you get one full vCPU. At 3,538 MB, you get two. This creates a counterintuitive optimization opportunity: more memory can be cheaper.
How It Works
A CPU-bound function at 128 MB might take 3,000ms to execute. At 512 MB (4x memory, 4x CPU), the same function might complete in 800ms. At 1,024 MB, it might take 400ms.
| Memory | Duration | GB-seconds | Cost per invocation |
|---|---|---|---|
| 128 MB | 3,000ms | 0.375 | $0.00000625 |
| 256 MB | 1,500ms | 0.375 | $0.00000625 |
| 512 MB | 800ms | 0.400 | $0.00000667 |
| 1,024 MB | 400ms | 0.400 | $0.00000667 |
| 1,769 MB | 250ms | 0.442 | $0.00000737 |
In this example, 128 MB and 256 MB cost the same despite the memory difference — because the function completes proportionally faster with more CPU. The cost-optimal point depends on whether your function is CPU-bound, I/O-bound, or memory-bound.
AWS Lambda Power Tuning
Use the open-source AWS Lambda Power Tuning tool to find the optimal memory setting automatically. It runs your function at multiple memory configurations and reports:
- Execution time at each memory level
- Cost per invocation at each memory level
- The cost-optimal and speed-optimal configurations
We run Power Tuning on every Lambda function in production. It typically reveals 20-40% cost savings on functions that were left at default memory settings.
Graviton (ARM) — 20% Cheaper
Lambda on ARM-based Graviton2 processors is 20% cheaper per GB-second than x86, with equivalent or better performance for most workloads.
| Architecture | Price per GB-second |
|---|---|
| x86_64 | $0.0000166667 |
| arm64 (Graviton2) | $0.0000133334 |
Switching to ARM is usually a one-line change in your function configuration. Most Node.js, Python, and Go functions work without modification. Java and .NET functions may need testing for native dependency compatibility.
Our recommendation: Default to arm64 for all new functions. Migrate existing functions to arm64 unless they have specific x86 dependencies.
Lambda SnapStart: Eliminating Cold Start Cost
Lambda SnapStart pre-initializes your function and takes a snapshot of the initialized execution environment. On invocation, Lambda restores from the snapshot rather than running the INIT phase — eliminating cold start overhead entirely.
SnapStart was originally launched for Java 11. As of 2025, it has been expanded to:
- Python 3.12+
- .NET 8+
- Java 11, 17, 21
Cost impact with the August 2025 INIT billing change: If your function has a 2-second INIT phase and you have 100,000 cold starts per month with 512 MB memory, the INIT billing adds:
100,000 × 0.5 GB × 2 seconds × $0.0000166667 = $1.67/monthAt higher scales (millions of invocations with cold starts), SnapStart pays for itself immediately. There is no additional charge for SnapStart beyond the snapshot storage ($0.0095/GB/month for the snapshot, typically a few cents).
SnapStart activation: A one-line change in your function configuration. The snapshot is taken at publish time, not at runtime.
Lambda Durable Functions
Announced at re:Invent 2025, Lambda Durable Functions brings stateful orchestration natively to Lambda without requiring Step Functions or external state stores. Durable Functions is designed for long-running workflows (minutes to days) with built-in state persistence.
🔄 Update (2026): Lambda Durable Functions is now Generally Available in US East (Ohio) with support for Python 3.13/3.14 and Node.js 22/24. Java support is in preview. The service enables workflows with up to 1-year execution duration — ideal for long-running AI agent pipelines, human-in-the-loop approvals, and multi-day batch processing. Expand to additional regions is ongoing; check the Lambda Durable Functions documentation for current regional availability.
Pricing: $8 per million orchestration operations (activity tasks, timer waits, external events)
Use cases:
- Multi-step approval workflows
- Long-running background jobs with checkpointing
- Fan-out/fan-in patterns with state aggregation
- Workflows that wait for human input or external events
For workflows that do not require the full expressiveness of Step Functions, Lambda Durable Functions offers simpler development and lower per-operation cost compared to Step Functions Standard workflows ($0.025 per state transition).
Pay-Per-Request vs. Provisioned Concurrency
This is the decision that trips up most teams: when does Provisioned Concurrency — which eliminates cold starts but adds always-on cost — actually save money?
On-Demand (Pay-Per-Request)
- Pay per invocation and per millisecond of execution
- Cold starts on first invocation and after idle periods
- Scales automatically from zero to thousands of concurrent executions
- Best for: variable traffic, background processing, non-latency-sensitive workloads
Provisioned Concurrency
- Pre-warms a specified number of execution environments
- Eliminates cold starts for those environments
- Charges per provisioned environment per hour ($0.0000041667 per GB-second, plus request charges)
- Best for: latency-sensitive APIs, predictable traffic patterns, compliance with response time SLAs
Break-Even Analysis
Provisioned Concurrency makes financial sense when:
- You need consistently low latency — Sub-100ms p99 response times that cold starts would violate
- You have predictable, steady traffic — The provisioned environments are utilized consistently
- Cold start cost exceeds provisioning cost — If cold starts cause retries, timeouts, or user drop-off, the indirect cost justifies provisioning
Example calculation: 10 Provisioned Concurrency units at 512 MB, running 24/7:
- Hourly cost: 10 × 0.5 GB × 3,600 seconds × $0.0000041667 = $0.075/hour
- Monthly cost: $0.075 × 720 hours = $54/month
If those 10 units handle 5 million invocations per month (average 7 per second), the provisioning cost is $0.0000108 per invocation — less than the on-demand duration cost for most functions.
The rule of thumb: If a Provisioned Concurrency unit would handle at least 5 invocations per minute on average, provisioning is usually cheaper than the equivalent on-demand invocations plus the cold start overhead.
Lambda Response Streaming: A New Cost Dimension
Lambda response streaming (GA since 2023, increasingly adopted in 2025–2026) changes the billing model for functions that return large payloads. Streaming functions are billed on:
- Execution duration — same as standard invocations
- Data streamed — $0.06 per GB of data returned to the caller
For functions returning small responses (API JSON under a few KB), streaming adds negligible cost. For functions returning large files, reports, or AI-generated content, the streaming charge can exceed the duration charge.
When streaming saves money: The real cost advantage of streaming is user-perceived performance — callers receive the first bytes immediately without waiting for full generation. This reduces client timeouts and improves UX for AI inference and large document generation use cases.
When streaming increases cost: If your function streams large binary payloads (images, PDFs, video) that could be served directly from S3 with a presigned URL, the streaming charge applies unnecessarily. Prefer: generate the asset → upload to S3 → return a presigned URL to the caller.
Lambda@Edge vs CloudFront Functions: Choose the Cheaper Option
Both services run code at CloudFront edge locations, but their pricing and capability differ significantly:
| Lambda@Edge | CloudFront Functions | |
|---|---|---|
| Price (requests) | $0.60 per million | $0.10 per million |
| Price (duration) | $0.00005001/GB-sec | Included in request price |
| Max execution time | 5–30 seconds | 1ms (sub-millisecond) |
| Runtime | Node.js, Python | JavaScript (ES5) |
| Network access | Yes | No |
| Use case | Complex logic, external calls | Header rewrites, URL redirects, simple A/B |
Cost difference: CloudFront Functions are 6× cheaper per million executions for workloads that fit within the 1ms execution limit. If you are using Lambda@Edge purely for header manipulation, URL rewriting, or basic redirect logic — switch to CloudFront Functions for an immediate 83% reduction in edge compute costs.
Rule of thumb: Use CloudFront Functions for stateless, sub-millisecond logic with no external calls. Use Lambda@Edge only when you need network access, longer execution time, or Node.js/Python-specific libraries.
Architecture-Level Cost Optimization
Use Direct Service Integrations
API Gateway can integrate directly with DynamoDB, SQS, Step Functions, and other services without a Lambda function in between. This eliminates Lambda invocation costs for simple operations.
Before (Lambda proxy):
API Gateway → Lambda (parse request, call DynamoDB, format response) → DynamoDBAfter (direct integration):
API Gateway → DynamoDB (VTL mapping template)Savings: 100% of Lambda cost for that route.
Batch Processing with SQS
When processing messages from SQS, Lambda can receive up to 10 messages per invocation (or up to 10,000 with batching windows). Processing 10 messages in one invocation costs the same as processing 1.
Before: 1 million messages = 1 million invocations After (batch size 10): 1 million messages = 100,000 invocations
Savings: 90% reduction in invocation costs plus proportional duration savings from amortized initialization.
Avoid Synchronous Chains
Synchronous function-to-function calls (Lambda invoking Lambda) double your costs and create cascading cold start risks. Use asynchronous patterns instead:
Avoid: API Gateway → Lambda A → Lambda B → Lambda C (serial, synchronous) Prefer: API Gateway → Lambda A → SQS/EventBridge → Lambda B (async, decoupled)
Right-Size Connection Handling
Lambda functions that connect to RDS databases create connection overhead on every cold start. Use RDS Proxy to pool connections, reducing both database load and Lambda execution time.
Without RDS Proxy: 200ms per invocation for connection establishment With RDS Proxy: 5ms per invocation for connection from pool
At scale, this connection overhead difference reduces both latency and cost significantly.
Monitoring Lambda Costs
CloudWatch Metrics to Track
- Invocations — Total function calls per period
- Duration — Average, p50, p95, p99 execution times
- ConcurrentExecutions — Peak concurrent executions (indicates scaling behavior)
- Throttles — Invocations rejected due to concurrency limits
- Errors — Failed invocations (retried invocations increase cost)
Cost Explorer Tags
Tag Lambda functions with:
Project— Which product or feature the function supportsEnvironment— Production, staging, developmentTeam— Which team owns the function
This enables per-project and per-team cost attribution in Cost Explorer.
Cost Anomaly Detection
Enable AWS Cost Anomaly Detection for Lambda to get alerts when spending deviates from historical patterns — catching runaway functions, infinite loops, or unexpected traffic spikes before they generate large bills.
Common Lambda Cost Mistakes
Mistake 1: Default Memory Settings
Lambda defaults to 128 MB, which is almost never optimal. Functions at 128 MB have minimal CPU and execute slowly, often costing more than the same function at 256 MB or 512 MB.
Mistake 2: Over-Provisioned Concurrency
Provisioning 100 concurrent environments “just in case” when your peak traffic only uses 20 wastes 80% of your provisioning spend. Use Application Auto Scaling to adjust Provisioned Concurrency based on actual demand.
Mistake 3: Logging Everything
console.log in every function with detailed request/response payloads generates massive CloudWatch Logs volumes. Verbose logging can cost more than the Lambda invocations themselves. Log strategically — errors always, debug only when needed.
2025 update — Tiered CloudWatch log pricing for Lambda (May 2025): AWS introduced tiered pricing for CloudWatch Logs ingestion from Lambda. The first 10 GB/month per function is charged at the standard $0.50/GB, with volume discounts beyond that threshold. More importantly, Lambda now supports S3 and Kinesis Firehose as direct log destinations — allowing you to route logs to S3 at $0.023/GB stored (vs. $0.50/GB ingested into CloudWatch), a 95% cost reduction for high-volume logging. Route structured logs directly to S3 via Firehose for analytics workloads, and send only error/warning logs to CloudWatch for real-time alerting.
Mistake 4: Not Using the Free Tier
The Lambda free tier (1M requests + 400,000 GB-seconds/month) applies every month, forever. For low-traffic functions, this means Lambda is genuinely free. Ensure your cost analysis accounts for the free tier.
Getting Started
Lambda cost optimization is not a one-time exercise. Workloads change, traffic patterns evolve, and AWS introduces new features and pricing options. We help organizations implement ongoing cost governance for serverless workloads as part of our broader AWS cost optimization services.
For end-to-end serverless architecture design and implementation, see our AWS Serverless Architecture Services.
AWS Cloud Architect & AI Expert
AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

