AWS Glossary
Amazon Redshift
Fully managed cloud data warehouse for running fast SQL analytics on petabyte-scale datasets.
AI & assistant-friendly summary
This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.
Summary
Fully managed cloud data warehouse for running fast SQL analytics on petabyte-scale datasets.
Key Facts
- • Definition Amazon **Redshift** is AWS's managed **columnar data warehouse** for analytic SQL at scale
- • Deploy as **Redshift Serverless** (RPU-hours, auto pause) or **provisioned RA3** clusters with **Managed Storage** in S3 for separated compute/storage scaling
- • Redshift Spectrum** queries external tables on S3 without loading data
- • Zero-ETL integrations** from **DynamoDB** (and other sources) replicate operational data for analytics without custom CDC
- • Data lake queries** via Spectrum or native tables on **S3 Tables / Parquet** without duplicating entire datasets locally
Entity Definitions
- S3
- S3 is an AWS service relevant to amazon redshift.
- RDS
- RDS is an AWS service relevant to amazon redshift.
- Aurora
- Aurora is an AWS service relevant to amazon redshift.
- DynamoDB
- DynamoDB is an AWS service relevant to amazon redshift.
- Amazon DynamoDB
- Amazon DynamoDB is an AWS service relevant to amazon redshift.
- Athena
- Athena is an AWS service relevant to amazon redshift.
- QuickSight
- QuickSight is an AWS service relevant to amazon redshift.
- serverless
- serverless is a cloud computing concept relevant to amazon redshift.
Related Content
- AWS DATA ANALYTICS — Related service
- FINOPS CONSULTING — Related service
Definition
Amazon Redshift is AWS’s managed columnar data warehouse for analytic SQL at scale. Data is stored column-by-column with automatic compression; massively parallel processing (MPP) spreads query fragments across nodes coordinated by a leader. Sort keys and distribution styles (KEY, ALL, EVEN, AUTO) determine I/O pruning and join behavior — tuning matters more than in OLTP engines.
Deploy as Redshift Serverless (RPU-hours, auto pause) or provisioned RA3 clusters with Managed Storage in S3 for separated compute/storage scaling. Redshift Spectrum queries external tables on S3 without loading data. Zero-ETL integrations from DynamoDB (and other sources) replicate operational data for analytics without custom CDC. Streaming ingestion from Kinesis or MSK lands near-real-time rows for dashboards without a batch landing zone.
When to use it
- BI and reporting — complex aggregations, window functions, and joins across billions of rows.
- Data lake queries via Spectrum or native tables on S3 Tables / Parquet without duplicating entire datasets locally.
- Intermittent analytics teams that prefer Serverless over 24/7 clusters — pay when queries run.
- Operational analytics fed by DynamoDB zero-ETL when you need warehouse SQL on live app data.
When not to use it
- Application OLTP — use RDS, Aurora, or DynamoDB; Redshift locks and MVCC behave poorly for high-frequency single-row updates.
- Low-volume monthly reports on modest data — Athena on S3 Tables may be cheaper without cluster baseline cost.
- Sub-second keyed lookups — Redshift is scan/aggregate optimized, not point-read optimized.
Tips
- Set
DISTKEYon large fact tables to the column most often joined — avoidEVENon huge tables that always join oncustomer_id. - Run
ANALYZEafter large loads; enable automatic analyze and vacuum unless you have a reason to micromanage. - Use Redshift Serverless workgroup base capacity floors for predictable SLAs; let it burst RPUs for ad hoc spikes.
- UNLOAD cold historical partitions to S3 Parquet and query via Spectrum — keeps hot cluster storage lean.
- Connect BI through QuickSight SPICE or aggregate tables to shield Redshift from dashboard-driven query storms.
Gotchas
- Serious: Broadcasting huge dimension tables because
DISTSTYLE ALLwas chosen lazily — memory errors and 100× slower joins follow. - Serious: Using Redshift as the system of record for mutable app state — updates are soft-delete heavy; storage bloat and vacuum debt accumulate fast.
- Regular: Sort key mismatch with common
WHEREclauses forces full column scans — zone maps never help. - Regular: Concurrency scaling costs surprise teams during Black Friday — cap concurrent scaling clusters if budget-bound.
- Regular: Zero-ETL lag is not zero — dashboards need freshness SLAs and monitoring on replication delay, not assumption of instant sync.
Official references
- Redshift distribution styles — KEY, ALL, EVEN, AUTO guidance.
- Zero-ETL integrations — supported sources and limits.
Related FactualMinds content
- AWS Data Analytics Services
- Amazon DynamoDB — zero-ETL source
- S3 Tables — lakehouse table format
- FinOps Consulting
- AWS Cloud Migration
Related Services
AWS Data Analytics Services — Glue, Athena & QuickSight
AWS data analytics services — scalable data warehouse, ETL/ELT pipelines, real-time analytics, and business intelligence.
FinOps Consulting — AWS Cloud Cost Governance
FinOps consulting — cloud cost governance, savings plans strategy, reserved instances, and continuous optimization.
Need help with this topic?
Our AWS-certified team implements, audits, and optimizes these services in production — from Bedrock RAG pipelines to multi-account landing zones.