Glossary/Vector Database
Inference & Deployment

Vector Database

A database built for storing and searching AI embeddings at scale.


Definition

A vector database is a specialised data store designed to efficiently store, index, and retrieve high-dimensional vector embeddings — numerical representations of text, images, audio, or other data produced by embedding models. Unlike traditional databases that retrieve rows matching exact conditions, vector databases retrieve items by semantic similarity: finding the vectors most similar to a query vector using approximate nearest neighbour (ANN) search algorithms. They are the core infrastructure of RAG systems, semantic search engines, recommendation systems, and long-term AI agent memory.

Why traditional databases can't do this

A text embedding is a vector of 768 to 3072 floating-point numbers — representing the semantic meaning of a passage. To find the passages most similar to a query, you need to compute the cosine similarity between the query vector and every stored vector, then return the top-k results. For a database of 10 million documents with 1536-dimensional embeddings, a naive brute-force search requires 10 million dot products per query. A PostgreSQL table can store these vectors, but SQL's query engine was not designed for this operation. Approximate nearest neighbour algorithms — HNSW, IVF-Flat, ScaNN — reduce this from O(n·d) to O(log n · d) with controllable accuracy tradeoffs, making billion-scale semantic search feasible.

Cosine similarity: measures the angle between two vectors regardless of magnitude. Returns 1.0 for identical direction (most similar), 0 for orthogonal (unrelated), -1 for opposite.

DatabaseTypeAlgorithmManaged serviceBest for
PineconePurpose-built vector DBHNSW + proprietaryYes (cloud-only)Production RAG apps; no infra management
WeaviatePurpose-built vector DBHNSWYes + self-hostMulti-tenancy; hybrid BM25+vector search
ChromaPurpose-built vector DBHNSW (via hnswlib)No — local/self-hostDevelopment, local testing, small-scale RAG
QdrantPurpose-built vector DBHNSWYes + self-hostHigh-performance; advanced filtering
pgvector (PostgreSQL)Extension to existing DBIVF-Flat / HNSWVia Supabase, NeonTeams already on Postgres; simpler stack
FAISS (Meta)Library (not a DB)IVF-Flat, HNSW, PQNo — library onlyResearch; custom applications; maximum control

Building a RAG system with a vector database

Complete RAG pipeline: embed documents → store in Chroma → retrieve → generate with Claude

from anthropic import Anthropic
import chromadb
from chromadb.utils import embedding_functions

# ── 1. Set up Chroma vector database ──────────────────────────────────────
client = chromadb.Client()  # in-memory; use chromadb.PersistentClient() for disk

# Use OpenAI embeddings (or swap for sentence-transformers for free local embed)
embed_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key="YOUR_OPENAI_KEY",
    model_name="text-embedding-3-small"  # 1536-dim, $0.02 per 1M tokens
)

collection = client.create_collection("lumichats_docs", embedding_function=embed_fn)

# ── 2. Ingest documents ────────────────────────────────────────────────────
documents = [
    "LumiChats charges ₹69 per active day. You only pay on days you use it.",
    "LumiChats Study Mode locks all AI answers to specific pages of your uploaded PDF.",
    "LumiChats supports 40+ models including Claude Sonnet 4.6, GPT-5.4, and Gemini 3 Pro.",
    "LumiChats Agent Mode enables multi-step autonomous task execution using frontier models.",
]
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

# ── 3. Retrieve relevant context for a query ──────────────────────────────
query = "How much does LumiChats cost?"
results = collection.query(query_texts=[query], n_results=2)
context = "\n".join(results["documents"][0])

# ── 4. Generate answer with Claude using retrieved context ─────────────────
anthropic = Anthropic()
response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    system=f"""Answer questions using only the provided context.
Context:
{context}

If the answer isn't in the context, say so explicitly.""",
    messages=[{"role": "user", "content": query}]
)
print(response.content[0].text)
# → "LumiChats charges ₹69 per active day — you only pay on days you actually use it."

Choosing the right vector database

For development and small projects (under 100,000 vectors): use Chroma locally — free, no account needed, 5-minute setup. For production (100K–10M vectors): Pinecone Serverless or Qdrant Cloud — managed, scalable, reasonable pricing. For teams already on Supabase or Neon (PostgreSQL): use pgvector — eliminates a separate service. For billion-scale search with full control: FAISS + custom infrastructure. The most common mistake is over-engineering: most RAG applications serve well under 1M vectors and don't need Pinecone's scale.

Practice questions

  1. What is approximate nearest neighbour (ANN) search and why is it used instead of exact nearest neighbour? (Answer: Exact nearest neighbour in 1536-dimensional embeddings space requires comparing the query vector against every stored vector — O(n·d) time and impractical for millions of vectors. ANN algorithms (HNSW, IVF, LSH) sacrifice a small amount of accuracy (miss a few true nearest neighbours) for massive speed improvements — typically 100–1000× faster. Production vector databases use ANN: Pinecone, Weaviate, Qdrant, and pgvector all use HNSW as their primary index.)
  2. What is HNSW (Hierarchical Navigable Small World) and why is it the dominant ANN algorithm? (Answer: HNSW builds a multi-layer graph where vectors are connected to their nearest neighbours. The top layers are sparse graphs with long-range connections (for fast approximate traversal). Bottom layers are dense with precise neighbourhood connections. Search: enter at the top layer, greedily navigate toward the query, descend to more detailed layers. Insert/search are both O(log n) — unlike tree structures which degrade in high dimensions. HNSW dominates because it achieves 95%+ recall at 100–1000× speedup over brute force.)
  3. In a RAG system, when would you use hybrid search (vector + keyword) instead of pure vector search? (Answer: Hybrid search combines dense (embedding) and sparse (BM25/TF-IDF) retrieval. Use hybrid when: (1) Queries include exact terms that must be matched (product IDs, proper nouns, technical terms). Pure vector search may retrieve semantically similar but wrong product. (2) Domain vocabulary is specialised — embeddings may not capture domain-specific term similarity. (3) Users mix broad conceptual queries with specific searches. Reciprocal Rank Fusion (RRF) combines the two ranking lists. Weaviate and Qdrant both support hybrid search natively.)
  4. What is the embedding dimensionality trade-off for vector databases? (Answer: Higher dimensions (e.g., 3072 for text-embedding-3-large): more nuanced semantic representation, higher search quality. Costs: more storage per vector (3072 × 4 bytes = 12KB vs 384 × 4 bytes = 1.5KB for small embeddings), slower indexing and search, higher memory usage. Lower dimensions (e.g., Matryoshka embeddings can be truncated to 256 dims): ~12× storage reduction, ~4× faster search, small accuracy loss. Matryoshka Representation Learning (MRL) trains models so early dimensions capture the most important information — enabling dimension selection at retrieval time.)
  5. What is the 'semantic gap' problem in vector search and how does query rewriting address it? (Answer: Semantic gap: user queries are often short, keyword-like, and expressed differently than the documents they target. A query 'Python list comprehension' may not retrieve a document titled 'Compact syntax for creating lists in Python' even if they are semantically equivalent — embedding similarity depends on training distribution. Query rewriting: use an LLM to expand or rephrase the query into multiple forms. HyDE (Hypothetical Document Embeddings): generate a hypothetical answer to the query and embed THAT — the answer embedding is closer in space to the actual answer document.)

On LumiChats

LumiChats Study Mode uses vector similarity search internally to retrieve the most relevant passages from your uploaded PDFs before generating answers — the same RAG architecture described here, built specifically for exam preparation with zero hallucination risk.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

5 terms