Vector Databases for RAG: Storage and Search

Part of the RAG series.

A vector database (or vector store) holds embeddings and supports similarity search: given a query vector, it returns the stored vectors that are closest to it. In RAG you use it to store chunk embeddings at index time and to run top-k retrieval at query time. This post covers what vector DBs do, how similarity search works and what to consider when you choose one.

What a Vector DB Does

A vector store typically lets you:

Insert vectors (and usually some metadata, e.g. chunk id, source doc, text).
Search by vector: you send a query vector and get back the k nearest vectors (and their metadata).
Optionally filter by metadata (e.g. "only chunks from doc X") and then run similarity search on the filtered set.

The core operation is k-nearest neighbours (k-NN): find the k vectors that are "closest" to the query. Closeness is usually cosine similarity or Euclidean distance. Exact k-NN is expensive on large datasets, so most production systems use approximate k-NN (e.g. HNSW, IVF) to trade a small amount of recall for speed and scale.

Similarity Metrics

Common choices:

Cosine similarity. Measures the angle between vectors; ignores length. Often used when embeddings are normalized. Range usually [-1, 1] or [0, 1] depending on implementation.
Euclidean distance (L2). Straight-line distance. Smaller is closer. Some indexes work in distance space and others in similarity space; check the docs.
Dot product. Often used when vectors are normalized; then dot product equals cosine similarity. Some APIs expose "dot product" or "inner product" as the metric.

You must use the same metric at index and query time, and it should match how your embedding model was trained or normalized.

Representative Vector Stores

Examples (conceptual, not exhaustive):

Pinecone, Weaviate, Qdrant, Milvus. Managed or self-hosted vector DBs. Good for scale and dedicated vector workloads.
Chroma, LanceDB. Often used for smaller or local-first setups; Chroma is embedding-store–centric.
pgvector (Postgres), Elasticsearch/OpenSearch. Add vector support to an existing DB or search engine. Useful if you already use them and want one system for metadata and vectors.

Choice depends on scale, latency, filtering needs, ops and whether you want a dedicated vector DB or something that also does full-text or structured query.

Typical Usage in RAG

# Indexing (once or on update)
for chunk in chunks:
    vec = embed(chunk.text)
    vector_db.upsert(id=chunk.id, vector=vec, metadata={...})

# Query
query_vec = embed(user_question)
hits = vector_db.search(query_vec, k=5, filter={...})
context = [h.metadata["text"] for h in hits]

For how those chunks are created, see Chunking Strategies for RAG. For dense vs sparse and hybrid search, see Retrieval Methods in RAG.

Get in touch

Questions about RAG or AI knowledge systems? Tell us about your project.