AI & assistant-friendly summary

This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.

Summary

Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models. This guide covers setup, data ingestion, cost optimization, and production patterns.

Key Facts

  • Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models
  • Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models

Entity Definitions

Amazon Bedrock
Amazon Bedrock is an AWS service discussed in this article.
Bedrock
Bedrock is an AWS service discussed in this article.
RAG
RAG is a cloud computing concept discussed in this article.
cost optimization
cost optimization is a cloud computing concept discussed in this article.

How to Build a RAG Pipeline with Amazon Bedrock Knowledge Bases

Generative AI Palaniappan P 6 min read

Quick summary: Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models. This guide covers setup, data ingestion, cost optimization, and production patterns.

Key Takeaways

  • Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models
  • Amazon Bedrock Knowledge Bases automate the RAG (Retrieval-Augmented Generation) pipeline — semantic search, chunking, embedding, and context injection into Claude or other foundation models
Table of Contents

Amazon Bedrock Knowledge Bases simplify building Retrieval-Augmented Generation (RAG) applications. Instead of orchestrating chunking, embedding, vector search, and context injection manually, Bedrock handles the pipeline end-to-end. Upload documents → Bedrock chunks and embeds them → retrieval happens automatically when you invoke Claude or other foundation models with the knowledge base.

This guide covers the full setup: creating and configuring a knowledge base, ingesting documents, querying it from your application, and optimizing for production cost and latency.

Building GenAI on AWS? FactualMinds helps teams architect and deploy Bedrock applications at scale. See our AWS Bedrock consulting services or talk to our team.

Step 1: Create an Amazon Bedrock Knowledge Base

Start in the AWS Console:

  1. Navigate to Amazon BedrockKnowledge Base
  2. Click Create Knowledge Base
  3. Name: my-company-kb (lowercase, no spaces)
  4. Description: Optional but recommended for tracking
  5. Select an embedding model: Choose from:
    • Titan Embeddings (default, $0.08 per 1M tokens, fully managed)
    • Custom embeddings (requires external OpenSearch or vector database)

For most use cases, Titan Embeddings is sufficient. It understands semantic relationships across business documents, code, and technical content.

  1. Click Next

Step 2: Configure Data Source and Vector Store

Bedrock needs two things: a data source (where documents live) and a vector store (where embeddings are stored).

Step 2A: Data Source

Choose where your documents are stored:

  • S3 bucket (recommended for bulk ingestion)

    • Create an S3 bucket or select an existing one: s3://my-company-docs/
    • Bedrock will scan for supported files: PDF, DOCX, TXT, MD, HTML
    • Set an S3 prefix if documents are in a subdirectory: bedrock-documents/
  • Web crawler (less common, for public websites)

    • Useful for ingesting documentation sites, but adds complexity

For this guide, use S3 bucket.

Step 2B: Vector Store

Bedrock creates a managed OpenSearch Serverless collection (no manual configuration needed):

  • Bedrock automatically creates a collection named bedrock-kb-{timestamp}
  • You are charged $0.15/hour for the collection regardless of query volume (minimum: $108/month)
  • Data is encrypted at rest using AWS managed keys

Click Create and Ingest to proceed.

Step 3: Ingest Documents into the Knowledge Base

Once the knowledge base is created, upload documents:

  1. In the Knowledge Base page, select your KB and go to Documents

  2. Upload documents:

    • Drag-and-drop files or click to browse
    • Supported formats: PDF, DOCX, TXT, MD, HTML, JSON, CSV (as semi-structured)
    • Max file size: 50MB per document
  3. Chunking strategy (default is fine for most cases):

    • Chunk size: 2,048 tokens (≈ 1,500 words)
    • Overlap: 512 tokens (improves semantic coherence at chunk boundaries)
    • These settings are NOT customizable in the console; to use custom chunking, use the API
  4. Metadata filtering (optional):

    • Add document metadata like department: engineering, year: 2026
    • Metadata can be used in retrieval queries to filter results
  5. Click Upload and ingest

Expected duration: Ingestion takes ~1 minute per 100MB of documents. A 1GB knowledge base takes ~10 minutes.

Step 4: Query the Knowledge Base from Your Application

Bedrock knowledge bases integrate with the Agents API and regular model invocation. Here’s how to retrieve documents and pass them to Claude:

Using the Agents API (recommended for production):

import boto3
import json

bedrock_agent_client = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
kb_id = 'your-knowledge-base-id'

response = bedrock_agent_client.retrieve_and_generate(
    input={'text': 'How does our cost optimization framework work?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID'  # Combines vector + keyword search
        }
    },
    sessionConfiguration={
        'kmsKeyArn': 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012'  # Optional encryption
    },
    knowledgeBaseId=kb_id,
    modelId='anthropic.claude-3-5-sonnet-20241022'
)

print(response['output']['text'])

This does three things:

  1. Retrieves the top 5 most semantically similar documents from the KB
  2. Generates a response using Claude 3.5 Sonnet with those documents as context
  3. Returns a single text response

Using the Bedrock Runtime API (for custom control):

If you need more control over the prompt or want to inject context into your own LLM calls:

bedrock_runtime_client = boto3.client('bedrock-runtime', region_name='us-east-1')
bedrock_agent_client = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# Step 1: Retrieve documents from KB
retrieval_results = bedrock_agent_client.retrieve(
    knowledgeBaseId=kb_id,
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID'
        }
    },
    text='How does our cost optimization framework work?'
)

# Step 2: Format documents as context
context = '\n\n'.join([
    f"Source: {item['metadata'].get('source', 'Unknown')}\n{item['content']['text']}"
    for item in retrieval_results['retrievalResults']
])

# Step 3: Invoke Claude with context
response = bedrock_runtime_client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022',
    body=json.dumps({
        'anthropic_version': 'bedrock-2023-06-01',
        'max_tokens': 2048,
        'messages': [
            {
                'role': 'user',
                'content': f"""You are a helpful assistant with access to company documentation.

Here is the relevant documentation:

<company_docs>
{context}
</company_docs>

User question: How does our cost optimization framework work?"""
            }
        ]
    })
)

print(json.loads(response['body'].read())['content'][0]['text'])

The second approach gives you full control over the prompt but requires you to handle chunking and deduplication yourself.

Step 5: Optimize Cost and Performance

Chunk size tuning:

  • Small chunks (500 tokens): Fast retrieval, better precision on specific Q&A (e.g., “What’s the pricing for service X?”), but requires more API calls if you retrieve multiple chunks
  • Large chunks (4,000+ tokens): Slower retrieval, better for narrative documents (e.g., whitepapers, architectural guides), fewer chunks needed
  • Default (2,048 tokens) is a good balance. Use the API to customize:
bedrock_agent_client.create_knowledge_base_data_source(
    knowledgeBaseId=kb_id,
    dataSourceConfiguration={
        'type': 'S3',
        's3BucketConfiguration': {
            'bucketArn': 'arn:aws:s3:::my-company-docs',
            'inclusionPrefixes': ['bedrock/'],
            'parsingConfiguration': {
                'parsingStrategy': 'BEDROCK_NATIVE',
                'bedrockNativeConfiguration': {
                    'parseDocumentPayloadFlag': True,
                    'parseHtmlTagConfiguration': {
                        'htmlTagsToExclude': ['script', 'style']
                    }
                }
            }
        }
    }
)

Filtering by metadata:

Use metadata to reduce the number of results returned (faster, cheaper):

response = bedrock_agent_client.retrieve_and_generate(
    input={'text': 'How does our cost optimization framework work?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'filter': {
                'equals': [
                    {'key': 'department', 'value': 'engineering'},
                    {'key': 'year', 'value': '2026'}
                ]
            }
        }
    },
    knowledgeBaseId=kb_id,
    modelId='anthropic.claude-3-5-sonnet-20241022'
)

Monitoring and cost:

  • Knowledge Base provisioning: $0.15/hour = $108/month (non-negotiable minimum)
  • Retrieval (GET): $0.15 per 1,000 calls
  • Ingestion (PUT): $0.15 per 1,000 uploads
  • Embeddings: $0.08 per 1M tokens (only if using Bedrock embeddings)

At 1,000 queries/day: $108 + ~$45/month in retrieval costs. At 10,000+ queries/day, consider OpenSearch Serverless with Titan Embeddings for potentially lower costs.

Step 6: Production-Ready Patterns

Hybrid Search (Vector + Keyword):

The HYBRID search type combines semantic similarity (vectors) with keyword matching, improving precision:

'overrideSearchType': 'HYBRID'  # Recommended for production

Reranking with Claude:

For high-stakes applications (e.g., customer support), retrieve more documents (e.g., 20) and have Claude rerank them:

# Retrieve 20 results, then ask Claude to pick the best 3
response = bedrock_agent_client.retrieve_and_generate(
    input={'text': 'How does our cost optimization framework work?'},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 20
        }
    },
    knowledgeBaseId=kb_id,
    modelId='anthropic.claude-3-5-sonnet-20241022'
)

Caching for repeated queries:

If you ask the same questions repeatedly, use Bedrock’s prompt caching to avoid re-retrieving and re-embedding:

response = bedrock_runtime_client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022',
    body=json.dumps({
        'system': [
            {
                'type': 'text',
                'text': 'You are a helpful assistant with access to company documentation.'
            },
            {
                'type': 'text',
                'text': context,
                'cache_control': {'type': 'ephemeral'}
            }
        ],
        'max_tokens': 2048,
        'messages': [{'role': 'user', 'content': 'How does our cost optimization framework work?'}]
    })
)

This caches the context for 5 minutes, reducing costs on repeated queries by 90%.

Common Mistakes to Avoid

  1. Uploading unstructured data without preprocessing

    • PDFs with images: Bedrock’s native parser may miss image content. Pre-OCR large documents.
    • Tables: Convert to markdown or JSON for better chunk coherence.
  2. Using too many documents

    • Knowledge bases with 100,000+ documents can hit retrieval latency. Consider partitioning into multiple KBs by domain.
  3. Not setting metadata

    • Without metadata, every query retrieves from the entire KB. Add department, product, or date metadata to narrow results.
  4. Ignoring the $108/month minimum

    • The knowledge base collection runs 24/7. If you don’t query it frequently, disable it or delete it.

Next Steps

  1. Ingest your first batch of documents
  2. Test retrieval with sample queries
  3. Monitor costs in the AWS Bedrock console
  4. Deploy to production with error handling and monitoring (e.g., CloudWatch alarms for failed retrievals)
  5. Talk to FactualMinds if you need help scaling to production volume or custom embedding models
PP
Palaniappan P

AWS Cloud Architect & AI Expert

AWS-certified cloud architect and AI expert with deep expertise in cloud migrations, cost optimization, and generative AI on AWS.

AWS ArchitectureCloud MigrationGenAI on AWSCost OptimizationDevOps

Ready to discuss your AWS strategy?

Our certified architects can help you implement these solutions.

Recommended Reading

Explore All Articles »

How to Build an Amazon Bedrock Agent with Tool Use (2026)

Amazon Bedrock Agents automate workflows by giving foundation models the ability to call tools (APIs, Lambda, databases). This guide covers building agents with tool definitions, testing in the console, handling errors, and scaling to production.

How to Set Up Amazon Bedrock Guardrails for Production

Amazon Bedrock Guardrails protect foundation models from harmful outputs — filtering on prompt injection, jailbreaks, toxicity, and PII. This guide covers setup, testing, cost optimization, and production safety patterns for GenAI applications.

How to Run SageMaker Training Jobs Cost-Efficiently

Amazon SageMaker automates ML training, but instance costs add up fast. This guide covers spot instances, instance selection, distributed training, and production patterns to reduce SageMaker costs by 50-70%.