AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations.

Key Facts

  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations

Entity Definitions

Lambda
Lambda is an AWS service discussed in this article.
AWS Lambda
AWS Lambda is an AWS service discussed in this article.
cost optimization
cost optimization is a cloud computing concept discussed in this article.

AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned

Quick summary: A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations.

Key Takeaways

  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
  • A practical guide to Lambda pricing models, memory tuning, Graviton savings, and when Provisioned Concurrency pays for itself versus standard on-demand invocations
AWS Lambda Cost Optimization: Pay-Per-Request vs Provisioned
Table of Contents

Lambda’s pay-per-request pricing is one of its biggest selling points — but “pay only for what you use” does not automatically mean “pay the least possible.” Without optimization, Lambda costs can grow faster than expected, especially as workloads scale.

This guide covers the practical cost optimization strategies we implement for clients running serverless workloads on AWS.

Understanding Lambda Pricing

Lambda charges for two things:

  1. Requests — $0.20 per million invocations
  2. Duration — $0.0000166667 per GB-second (charged per millisecond)

Duration cost depends on two factors you control: memory allocation (which also determines CPU) and execution time.

Example: A function with 512 MB memory running for 200ms:

  • Duration cost: 0.5 GB × 0.2 seconds × $0.0000166667 = $0.00000167
  • Request cost: $0.0000002
  • Total per invocation: ~$0.0000019
  • At 10 million invocations/month: ~$19

The free tier provides 1 million requests and 400,000 GB-seconds per month — enough for many development and low-traffic production workloads.

Important: INIT Phase Billing Change (August 2025)

Effective August 1, 2025, AWS changed Lambda pricing to bill for the INIT phase (cold start initialization time). Previously, the INIT phase was free. Now, initialization time counts toward your billed duration at the same rate as execution time.

Impact: Functions with long initialization times (loading large dependencies, establishing database connections, loading ML models) now have higher effective costs per cold start. This change makes cold start optimization more financially important than before.

  • Mitigations: Lambda SnapStart, connection pooling via RDS Proxy, lazy loading, and reducing package size
  • Who is most affected: Java and .NET functions with large framework startup times, functions loading ML models, functions initializing large SDKs

For new functions, target INIT times under 200ms. For existing functions with INIT times over 1 second, evaluate Lambda SnapStart or architecture redesign.

Memory Tuning: The Biggest Lever

Lambda CPU scales linearly with memory. At 1,769 MB, you get one full vCPU. At 3,538 MB, you get two. This creates a counterintuitive optimization opportunity: more memory can be cheaper.

How It Works

A CPU-bound function at 128 MB might take 3,000ms to execute. At 512 MB (4x memory, 4x CPU), the same function might complete in 800ms. At 1,024 MB, it might take 400ms.

MemoryDurationGB-secondsCost per invocation
128 MB3,000ms0.375$0.00000625
256 MB1,500ms0.375$0.00000625
512 MB800ms0.400$0.00000667
1,024 MB400ms0.400$0.00000667
1,769 MB250ms0.442$0.00000737

In this example, 128 MB and 256 MB cost the same despite the memory difference — because the function completes proportionally faster with more CPU. The cost-optimal point depends on whether your function is CPU-bound, I/O-bound, or memory-bound.

AWS Lambda Power Tuning

Use the open-source AWS Lambda Power Tuning tool to find the optimal memory setting automatically. It runs your function at multiple memory configurations and reports:

  • Execution time at each memory level
  • Cost per invocation at each memory level
  • The cost-optimal and speed-optimal configurations

We run Power Tuning on every Lambda function in production. It typically reveals 20-40% cost savings on functions that were left at default memory settings.

Graviton (ARM) — 20% Cheaper

Lambda on ARM-based Graviton2 processors is 20% cheaper per GB-second than x86, with equivalent or better performance for most workloads.

ArchitecturePrice per GB-second
x86_64$0.0000166667
arm64 (Graviton2)$0.0000133334

Switching to ARM is usually a one-line change in your function configuration. Most Node.js, Python, and Go functions work without modification. Java and .NET functions may need testing for native dependency compatibility.

Our recommendation: Default to arm64 for all new functions. Migrate existing functions to arm64 unless they have specific x86 dependencies.

Lambda SnapStart: Eliminating Cold Start Cost

Lambda SnapStart pre-initializes your function and takes a snapshot of the initialized execution environment. On invocation, Lambda restores from the snapshot rather than running the INIT phase — eliminating cold start overhead entirely.

SnapStart was originally launched for Java 11. As of 2025, it has been expanded to:

  • Python 3.12+
  • .NET 8+
  • Java 11, 17, 21

Cost impact with the August 2025 INIT billing change: If your function has a 2-second INIT phase and you have 100,000 cold starts per month with 512 MB memory, the INIT billing adds:

100,000 × 0.5 GB × 2 seconds × $0.0000166667 = $1.67/month

At higher scales (millions of invocations with cold starts), SnapStart pays for itself immediately. There is no additional charge for SnapStart beyond the snapshot storage ($0.0095/GB/month for the snapshot, typically a few cents).

SnapStart activation: A one-line change in your function configuration. The snapshot is taken at publish time, not at runtime.

Lambda Durable Functions

Announced at re:Invent 2025, Lambda Durable Functions brings stateful orchestration natively to Lambda without requiring Step Functions or external state stores. Durable Functions is designed for long-running workflows (minutes to days) with built-in state persistence.

🔄 Update (2026): Lambda Durable Functions is now Generally Available in US East (Ohio) with support for Python 3.13/3.14 and Node.js 22/24. Java support is in preview. The service enables workflows with up to 1-year execution duration — ideal for long-running AI agent pipelines, human-in-the-loop approvals, and multi-day batch processing. Expand to additional regions is ongoing; check the Lambda Durable Functions documentation for current regional availability.

Pricing: $8 per million orchestration operations (activity tasks, timer waits, external events)

Use cases:

  • Multi-step approval workflows
  • Long-running background jobs with checkpointing
  • Fan-out/fan-in patterns with state aggregation
  • Workflows that wait for human input or external events

For workflows that do not require the full expressiveness of Step Functions, Lambda Durable Functions offers simpler development and lower per-operation cost compared to Step Functions Standard workflows ($0.025 per state transition).

Pay-Per-Request vs. Provisioned Concurrency

This is the decision that trips up most teams: when does Provisioned Concurrency — which eliminates cold starts but adds always-on cost — actually save money?

On-Demand (Pay-Per-Request)

  • Pay per invocation and per millisecond of execution
  • Cold starts on first invocation and after idle periods
  • Scales automatically from zero to thousands of concurrent executions
  • Best for: variable traffic, background processing, non-latency-sensitive workloads

Provisioned Concurrency

  • Pre-warms a specified number of execution environments
  • Eliminates cold starts for those environments
  • Charges per provisioned environment per hour ($0.0000041667 per GB-second, plus request charges)
  • Best for: latency-sensitive APIs, predictable traffic patterns, compliance with response time SLAs

Break-Even Analysis

Provisioned Concurrency makes financial sense when:

  1. You need consistently low latency — Sub-100ms p99 response times that cold starts would violate
  2. You have predictable, steady traffic — The provisioned environments are utilized consistently
  3. Cold start cost exceeds provisioning cost — If cold starts cause retries, timeouts, or user drop-off, the indirect cost justifies provisioning

Example calculation: 10 Provisioned Concurrency units at 512 MB, running 24/7:

  • Hourly cost: 10 × 0.5 GB × 3,600 seconds × $0.0000041667 = $0.075/hour
  • Monthly cost: $0.075 × 720 hours = $54/month

If those 10 units handle 5 million invocations per month (average 7 per second), the provisioning cost is $0.0000108 per invocation — less than the on-demand duration cost for most functions.

The rule of thumb: If a Provisioned Concurrency unit would handle at least 5 invocations per minute on average, provisioning is usually cheaper than the equivalent on-demand invocations plus the cold start overhead.

Lambda Response Streaming: A New Cost Dimension

Lambda response streaming (GA since 2023, increasingly adopted in 2025–2026) changes the billing model for functions that return large payloads. Streaming functions are billed on:

  1. Execution duration — same as standard invocations
  2. Data streamed — $0.06 per GB of data returned to the caller

For functions returning small responses (API JSON under a few KB), streaming adds negligible cost. For functions returning large files, reports, or AI-generated content, the streaming charge can exceed the duration charge.

When streaming saves money: The real cost advantage of streaming is user-perceived performance — callers receive the first bytes immediately without waiting for full generation. This reduces client timeouts and improves UX for AI inference and large document generation use cases.

When streaming increases cost: If your function streams large binary payloads (images, PDFs, video) that could be served directly from S3 with a presigned URL, the streaming charge applies unnecessarily. Prefer: generate the asset → upload to S3 → return a presigned URL to the caller.

Lambda@Edge vs CloudFront Functions: Choose the Cheaper Option

Both services run code at CloudFront edge locations, but their pricing and capability differ significantly:

Lambda@EdgeCloudFront Functions
Price (requests)$0.60 per million$0.10 per million
Price (duration)$0.00005001/GB-secIncluded in request price
Max execution time5–30 seconds1ms (sub-millisecond)
RuntimeNode.js, PythonJavaScript (ES5)
Network accessYesNo
Use caseComplex logic, external callsHeader rewrites, URL redirects, simple A/B

Cost difference: CloudFront Functions are 6× cheaper per million executions for workloads that fit within the 1ms execution limit. If you are using Lambda@Edge purely for header manipulation, URL rewriting, or basic redirect logic — switch to CloudFront Functions for an immediate 83% reduction in edge compute costs.

Rule of thumb: Use CloudFront Functions for stateless, sub-millisecond logic with no external calls. Use Lambda@Edge only when you need network access, longer execution time, or Node.js/Python-specific libraries.

Architecture-Level Cost Optimization

Use Direct Service Integrations

API Gateway can integrate directly with DynamoDB, SQS, Step Functions, and other services without a Lambda function in between. This eliminates Lambda invocation costs for simple operations.

Before (Lambda proxy):

API Gateway → Lambda (parse request, call DynamoDB, format response) → DynamoDB

After (direct integration):

API Gateway → DynamoDB (VTL mapping template)

Savings: 100% of Lambda cost for that route.

Batch Processing with SQS

When processing messages from SQS, Lambda can receive up to 10 messages per invocation (or up to 10,000 with batching windows). Processing 10 messages in one invocation costs the same as processing 1.

Before: 1 million messages = 1 million invocations After (batch size 10): 1 million messages = 100,000 invocations

Savings: 90% reduction in invocation costs plus proportional duration savings from amortized initialization.

Avoid Synchronous Chains

Synchronous function-to-function calls (Lambda invoking Lambda) double your costs and create cascading cold start risks. Use asynchronous patterns instead:

Avoid: API Gateway → Lambda A → Lambda B → Lambda C (serial, synchronous) Prefer: API Gateway → Lambda A → SQS/EventBridge → Lambda B (async, decoupled)

Right-Size Connection Handling

Lambda functions that connect to RDS databases create connection overhead on every cold start. Use RDS Proxy to pool connections, reducing both database load and Lambda execution time.

Without RDS Proxy: 200ms per invocation for connection establishment With RDS Proxy: 5ms per invocation for connection from pool

At scale, this connection overhead difference reduces both latency and cost significantly.

Monitoring Lambda Costs

CloudWatch Metrics to Track

  • Invocations — Total function calls per period
  • Duration — Average, p50, p95, p99 execution times
  • ConcurrentExecutions — Peak concurrent executions (indicates scaling behavior)
  • Throttles — Invocations rejected due to concurrency limits
  • Errors — Failed invocations (retried invocations increase cost)

Cost Explorer Tags

Tag Lambda functions with:

  • Project — Which product or feature the function supports
  • Environment — Production, staging, development
  • Team — Which team owns the function

This enables per-project and per-team cost attribution in Cost Explorer.

Cost Anomaly Detection

Enable AWS Cost Anomaly Detection for Lambda to get alerts when spending deviates from historical patterns — catching runaway functions, infinite loops, or unexpected traffic spikes before they generate large bills.

Common Lambda Cost Mistakes

Mistake 1: Default Memory Settings

Lambda defaults to 128 MB, which is almost never optimal. Functions at 128 MB have minimal CPU and execute slowly, often costing more than the same function at 256 MB or 512 MB.

Mistake 2: Over-Provisioned Concurrency

Provisioning 100 concurrent environments “just in case” when your peak traffic only uses 20 wastes 80% of your provisioning spend. Use Application Auto Scaling to adjust Provisioned Concurrency based on actual demand.

Mistake 3: Logging Everything

console.log in every function with detailed request/response payloads generates massive CloudWatch Logs volumes. Verbose logging can cost more than the Lambda invocations themselves. Log strategically — errors always, debug only when needed.

2025 update — Tiered CloudWatch log pricing for Lambda (May 2025): AWS introduced tiered pricing for CloudWatch Logs ingestion from Lambda. The first 10 GB/month per function is charged at the standard $0.50/GB, with volume discounts beyond that threshold. More importantly, Lambda now supports S3 and Kinesis Firehose as direct log destinations — allowing you to route logs to S3 at $0.023/GB stored (vs. $0.50/GB ingested into CloudWatch), a 95% cost reduction for high-volume logging. Route structured logs directly to S3 via Firehose for analytics workloads, and send only error/warning logs to CloudWatch for real-time alerting.

Mistake 4: Not Using the Free Tier

The Lambda free tier (1M requests + 400,000 GB-seconds/month) applies every month, forever. For low-traffic functions, this means Lambda is genuinely free. Ensure your cost analysis accounts for the free tier.

Getting Started

Lambda cost optimization is not a one-time exercise. Workloads change, traffic patterns evolve, and AWS introduces new features and pricing options. We help organizations implement ongoing cost governance for serverless workloads as part of our broader AWS cost optimization services.

For end-to-end serverless architecture design and implementation, see our AWS Serverless Architecture Services.

Contact us to optimize your serverless costs →

PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »

Karpenter vs Cluster Autoscaler: EKS Node Cost Optimization in 2026

Karpenter replaces Cluster Autoscaler as the recommended EKS node autoscaler. It provisions nodes faster, selects better-fit instance types per workload, and consolidates nodes more aggressively — typically reducing EKS compute costs by 20-40% compared to an equivalent Cluster Autoscaler deployment.

Autoscaling Broke Your Budget (AI Made It Worse)

Autoscaling Broke Your Budget (AI Made It Worse)

Autoscaling was supposed to make costs predictable by matching capacity to demand. Instead, it introduced feedback loops, burst amplification, and — with AI workloads — a new class of non-deterministic spend that no scaling policy anticipates.