What is a safe cardinality budget per metric?

Keep high-cardinality dimensions (user ID, request ID, URL path with IDs) off Prometheus labels. As a rule: if a label can take more than ~100 unique values in production, it belongs in logs or traces—not metrics.

Should I use AMP or self-managed Prometheus on EKS?

AMP removes ops toil and integrates with managed Grafana; cost is ingestion + storage driven by active series. Self-managed saves service fees but not cardinality mistakes—you still pay EC2 and ops time.

Prometheus Cardinality on AWS AMP

Prometheus Cardinality Explosion on AWS: AMP, EMF, and Cost-Aware Metrics

Quick summary: That `user_id` label on every HTTP metric turns Amazon Managed Prometheus into a five-figure line item. This guide explains cardinality mechanics, EMF vs remote write, and Application Signals defaults worth disabling.

Key Takeaways

That `user_id` label on every HTTP metric turns Amazon Managed Prometheus into a five-figure line item
Amazon Managed Prometheus (AMP) pricing (June 2026) scales with metrics ingested and stored—cardinality is dollars
Benchmark pattern — OTel demo workload on EKS: enabling with raw URL paths pushed active series from 12k → 890k in 6 h; AMP estimate +$2,400/mo
Relabel to template routes ( ) restored 14k series
See observability beyond CloudWatch for stack wiring

Amazon Managed Prometheus (AMP) pricing (June 2026) scales with metrics ingested and stored—cardinality is dollars. A single histogram with path label including UUIDs can create millions of active series within hours.

Benchmark pattern — OTel demo workload on EKS: enabling http.route with raw URL paths pushed active series from 12k → 890k in 6 h; AMP estimate +$2,400/mo. Relabel to template routes (/users/{id}) restored 14k series. See observability beyond CloudWatch for stack wiring.

Symptom → mechanism → AWS control

Production symptom	Mechanism	AWS control
Observability bill exceeds compute	High-cardinality labels (user_id, pod)	AMP relabel_configs drop, aggregate recording rules
CloudWatch PutMetricData throttled	Custom metric cardinality limits	EMF embedded metrics with bounded dimensions
Dashboard slow to load	Million-series PromQL scan	AMP query logging, cardinality governance policy

Opinionated take: Treat metric labels like database indexes—approve new high-cardinality labels in review, and drop pod-name from application metrics at scrape time.

Mechanism

Prometheus identifies a time series by metric name + label set. Each unique combination is billed storage and query cost. High-cardinality labels (IDs) multiply series combinatorially with other labels (status, method, pod).

AWS controls

Approach	Service	Use when
Managed backend	AMP + AMG	EKS/ECS metrics at scale
Embedded metrics	CloudWatch EMF	Lambda/custom apps without scrape
SLO-native	Application Signals	Service golden signals—watch auto-discovered ops
Cost guard	Metric filters + alarms on `IncomingLogEvents` / AMP workspace limits	FinOps gate

Opinionated take: Relabel at the collector (ADOT) before remote_write—do not fix cardinality in Grafana dashboards.

AWS services map

Need	Service	Skip when
Cardinality control	AMP + relabel/aggregate rules	CloudWatch only with <10 custom metrics
Cost attribution	Cost and Usage Report + AMP usage metrics	Single-service app with default metrics
FinOps guardrails	CloudWatch anomaly detection on AMP spend	Pre-production with negligible ingest

When this advice breaks

Short-lived batch jobs — High churn series may be acceptable if retention is 24h and jobs are few.
Debugging incidents — Temporary high-cardinality scrape OK with documented TTL and owner.

What to do this week

Export top 20 labels by series count from AMP or Prometheus label_values sampling.
Add drop/labelmap processors in ADOT config for forbidden labels (user_id, trace_id).
Set CloudWatch alarm on AMP DiscardedSamples or workspace ingestion rate spike.
Pair with log sampling guide (part 3 of this track).

What this guide doesn’t cover

Distributed tracing propagation—see part 1 OTel guide in this track.

Prometheus Cardinality Explosion on AWS: AMP, EMF, and Cost-Aware Metrics

Symptom → mechanism → AWS control

Mechanism

AWS controls

AWS services map

When this advice breaks

What to do this week

More in This Track

What this guide doesn’t cover

Recommended Reading

Log Aggregation and Intelligent Sampling with CloudWatch and OpenTelemetry

Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To

From One FIS Experiment to a Resilience Program (2026): AWS Fault Injection Service, Stop Conditions, and GameDays That Actually Change Behavior

Designing a Customer-Facing SLA on AWS (2026): SLO Error Budgets and the Composite-Availability Math Most Teams Skip

AI & assistant-friendly summary

Summary

Key Facts

Entity Definitions

Related Content

Symptom → mechanism → AWS control

Mechanism

AWS controls

AWS services map

When this advice breaks

What to do this week

More in This Track

What this guide doesn’t cover

Recommended Reading

Log Aggregation and Intelligent Sampling with CloudWatch and OpenTelemetry

Observability Beyond CloudWatch (2026): When to Add Application Signals, ADOT, Managed Prometheus, and Grafana — and When Not To

From One FIS Experiment to a Resilience Program (2026): AWS Fault Injection Service, Stop Conditions, and GameDays That Actually Change Behavior

Designing a Customer-Facing SLA on AWS (2026): SLO Error Budgets and the Composite-Availability Math Most Teams Skip