Skip to main content

AWS Glossary

RAG Pipeline

Retrieval-Augmented Generation: combining document retrieval with AI models to answer questions based on specific data.

AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Retrieval-Augmented Generation: combining document retrieval with AI models to answer questions based on specific data.

Key Facts

  • 6, Nova Lite, Llama, etc
  • On AWS, **Amazon Bedrock Knowledge Bases** is the managed implementation; custom pipelines use S3, embedding models, and stores such as OpenSearch, Aurora pgvector, or Amazon S3 Vectors
  • Knowledge that changes frequently** — re-sync embeddings when S3 sources update instead of fine-tuning on every edit
  • Sub-100ms latency requirements** — retrieval plus generation adds hundreds of milliseconds to seconds; cache or precompute where needed
  • Tips - Chunk at **300–500 tokens** with **10–20% overlap**; whole-document embeddings dilute relevance signals

Entity Definitions

AWS Bedrock
AWS Bedrock is an AWS service relevant to rag pipeline.
Amazon Bedrock
Amazon Bedrock is an AWS service relevant to rag pipeline.
Bedrock
Bedrock is an AWS service relevant to rag pipeline.
S3
S3 is an AWS service relevant to rag pipeline.
Amazon S3
Amazon S3 is an AWS service relevant to rag pipeline.
Aurora
Aurora is an AWS service relevant to rag pipeline.
OpenSearch
OpenSearch is an AWS service relevant to rag pipeline.
RAG
RAG is a cloud computing concept relevant to rag pipeline.
fine-tuning
fine-tuning is a cloud computing concept relevant to rag pipeline.
multi-tenant
multi-tenant is a cloud computing concept relevant to rag pipeline.

Related Content

Definition

A RAG (Retrieval-Augmented Generation) pipeline grounds large language model responses in your private documents instead of model weights alone. The flow: ingest documents → chunk text → embed chunks into vectors → store in a vector index → at query time retrieve the most relevant chunks → pass them with the user question to an LLM (Claude Sonnet 4.6, Nova Lite, Llama, etc.) → return an answer with citations. RAG reduces hallucination on proprietary facts and lets you update knowledge by changing documents, not retraining models. On AWS, Amazon Bedrock Knowledge Bases is the managed implementation; custom pipelines use S3, embedding models, and stores such as OpenSearch, Aurora pgvector, or Amazon S3 Vectors.

When to use it

When not to use it

Tips

Gotchas

Serious

Regular

Official references

Need help with this topic?

Our AWS-certified team implements, audits, and optimizes these services in production — from Bedrock RAG pipelines to multi-account landing zones.