AWS Glossary
Amazon S3 Vectors
S3 Vectors is the AWS native vector store — purpose-built vector storage on S3 with up to 90% lower cost than dedicated vector databases for RAG workloads.
AI & assistant-friendly summary
This section provides structured content for AI assistants and search engines. You can cite or summarize it when referencing this page.
Summary
S3 Vectors is the AWS native vector store — purpose-built vector storage on S3 with up to 90% lower cost than dedicated vector databases for RAG workloads.
Key Facts
- • S3 Vectors is the AWS native vector store — purpose-built vector storage on S3 with up to 90% lower cost than dedicated vector databases for RAG workloads
- • Definition Amazon **S3 Vectors** is a native vector storage tier on S3 for embeddings and similarity search
- • As of **June 16, 2026**, `QueryVectors` returns up to **10,000** similarity search results per query (100× the prior 100-result limit), with **paginated** responses via `nextToken`
- • Query **data-processed charges** on indexes with **more than 10 million vectors** dropped **up to 80%** automatically
- • Large result sets may incur **data-returned** fees beyond the first **512 KB** per query — see the [S3 pricing page](https://aws
Entity Definitions
- Amazon Bedrock
- Amazon Bedrock is an AWS service relevant to amazon s3 vectors.
- Bedrock
- Bedrock is an AWS service relevant to amazon s3 vectors.
- Lambda
- Lambda is an AWS service relevant to amazon s3 vectors.
- S3
- S3 is an AWS service relevant to amazon s3 vectors.
- Amazon S3
- Amazon S3 is an AWS service relevant to amazon s3 vectors.
- Aurora
- Aurora is an AWS service relevant to amazon s3 vectors.
- Athena
- Athena is an AWS service relevant to amazon s3 vectors.
- OpenSearch
- OpenSearch is an AWS service relevant to amazon s3 vectors.
- RAG
- RAG is a cloud computing concept relevant to amazon s3 vectors.
- multi-tenant
- multi-tenant is a cloud computing concept relevant to amazon s3 vectors.
- serverless
- serverless is a cloud computing concept relevant to amazon s3 vectors.
- compliance
- compliance is a cloud computing concept relevant to amazon s3 vectors.
Related Content
- AWS BEDROCK — Related service
- GENERATIVE AI ON AWS — Related service
Definition
Amazon S3 Vectors is a native vector storage tier on S3 for embeddings and similarity search. Vector buckets store high-dimensional vectors with metadata filters; indexes support cosine, Euclidean, and dot-product distance metrics. S3 Vectors reached GA in 2025 as a Bedrock Knowledge Bases vector store option alongside OpenSearch Serverless, Aurora pgvector, and partner engines — targeting RAG and semantic search where storage cost dominates OpenSearch OCU-hours or dedicated vector DB pods.
As of June 16, 2026, QueryVectors returns up to 10,000 similarity search results per query (100× the prior 100-result limit), with paginated responses via nextToken. Query data-processed charges on indexes with more than 10 million vectors dropped up to 80% automatically. Large result sets may incur data-returned fees beyond the first 512 KB per query — see the S3 pricing page.
The trade-off is latency: expect roughly sub-100ms to low hundreds of ms query times suitable for batch retrieval, wide recall + rerank pipelines, and many chat RAG flows — not sub-10ms agent loops at thousands of QPS.
| Store (illustrative) | Cost driver | Latency profile | Max topK (June 2026) |
|---|---|---|---|
| OpenSearch Serverless | OCU-hours + storage | Lower p99 on small indexes | Index-dependent |
| Dedicated vector SaaS | Pod/replica hours | Tunable, vendor-specific | Vendor-specific |
| S3 Vectors | Storage + per-query | Higher tail, lowest storage $ | 10,000 (paginated) |
When to use it
- Bedrock Knowledge Bases RAG with large corpora (10M+ chunks) where OpenSearch baseline OCUs inflate monthly cost — especially after the June 2026 large-index query discount.
- Multi-stage retrieval — wide
topKrecall, client-side rerank, dedup bydocument_id— now practical without sharding workarounds for the old 100-result cap. - Multi-tenant SaaS needing S3-native isolation (prefix or bucket per tenant) with metadata filters at retrieval.
- Archival or long-tail knowledge sets queried occasionally but stored durably for compliance.
When not to use it
- Agentic workflows requiring sub-50ms retrieval inside tight tool-call loops at high QPS — OpenSearch Serverless or in-memory caches win.
- Defaulting to topK=10,000 for simple chat RAG — five chunks to the LLM does not need wide recall; you pay latency and data-returned fees for no gain.
- Hybrid lexical + vector search as a single managed engine — OpenSearch hybrid or Kendra may fit better.
- Graph-heavy relationship traversal — Neptune Analytics combines graph and vector where edges matter.
Tips
- Design metadata fields for mandatory filters (tenant, ACL, doc version) before first ingest — re-indexing billion-vector buckets is painful.
- On wide recall passes, set
returnMetadata=TrueandreturnData=False; fetch chunk text only for post-rerank top-N. - Paginate
QueryVectorswithnextToken— process the first page while fetching the next; do not buffer thousands of payloads in Lambda memory. - Upgrade AWS SDKs after June 16, 2026 for pagination support on
QueryVectors. - Run recall@k benchmarks before raising
topK; cheapest store is worthless if reranked quality does not improve.
Gotchas
- Serious: Raising
topKto thousands withreturnData=Truewithout pagination — OOM in Lambda and unexpected data-returned charges past the 512 KB free tier. - Serious: Using S3 Vectors for real-time agent tool retrieval without load testing — tail latency spikes under concurrent sessions frustrate users.
- Serious: Stale embeddings when source documents change but sync jobs fail silently — pair with document version metadata and health alarms on sync lag.
- Regular: Assuming hybrid keyword search exists natively — you may still need OpenSearch or Athena on structured fields for keyword-heavy queries.
- Regular: Cross-region inference in Bedrock reading vectors in another region adds data transfer — colocate vector buckets with Knowledge Base and model region.
Official references
- Querying vectors — QueryVectors, filters, recall testing.
- Create a vector index — index types and limits.
- Knowledge Bases data source sync — supported ingestion paths.
Related FactualMinds content
Related Services
Amazon Bedrock Consulting for Production LLM Applications
Amazon Bedrock implementation consulting — Knowledge Bases, Agents, Guardrails, model routing, and production RAG. Hands-on Bedrock engineering, not GenAI strategy.
Generative AI on AWS — Production-Ready LLM Apps in Weeks
Generative AI strategy and delivery on AWS — use-case selection, Bedrock + SageMaker architecture, governance, evaluations, and production rollout across the AWS AI stack.
Need help with this topic?
Our AWS-certified team implements, audits, and optimizes these services in production — from Bedrock RAG pipelines to multi-account landing zones.