What is embedding model comparison (2025)?

Embeddings: Embedding model comparison (2025). Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/embeddings

What is use cases beyond RAG?

Embeddings: Use cases beyond RAG. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/embeddings

Embeddings

Embeddings are dense numerical vector representations of text, images, or other data that capture their semantic meaning. Similar concepts produce embeddings that are mathematically close in high-dimensional space, allowing AI systems to perform semantic search, clustering, classification, and retrieval based on meaning rather than keyword matching.

How AI understands meaning, not just words.

Category: AI Fundamentals

What an embedding looks like

An embedding is simply a list of floating-point numbers — a vector. The length of this list is called the embedding dimension. Modern text embedding models produce vectors with 768 to 3,072 dimensions.

from openai import OpenAI
import numpy as np

client = OpenAI()  # uses OPENAI_API_KEY env variable

def embed(text: str) -> np.ndarray:
    response = client.embeddings.create(
        model="text-embedding-3-small",  # 1536 dimensions
        input=text
    )
    return np.array(response.data[0].embedding)

king   = embed("king")
queen  = embed("queen")
banana = embed("banana")

print(f"Embedding shape: {king.shape}")    # (1536,)
print(f"First 5 values:  {king[:5]}")
# e.g. [ 0.021, -0.083,  0.045, -0.012,  0.067, ...]
# 1536 numbers — each individually meaningless,
# but together they encode the word's meaning

Shape matters: These 1536 numbers have no individually interpretable meaning. What matters is the geometric relationship: words with similar meanings live close together in this 1536-dimensional space.

Cosine similarity: measuring meaning distance

To compare how similar two embeddings are, we use cosine similarity — the cosine of the angle between the two vectors. This is preferred over Euclidean distance for embeddings because it measures directional similarity, not magnitude:

\cos(\theta) = \frac{A \cdot B}{\|A\| \cdot \|B\|}

import numpy as np

def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """Cosine similarity between two vectors."""
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# Using the embeddings from the previous example:
print(f"king  ↔ queen:  {cosine_similarity(king, queen):.4f}")   # ~0.87
print(f"king  ↔ banana: {cosine_similarity(king, banana):.4f}")  # ~0.21
print(f"king  ↔ king:   {cosine_similarity(king, king):.4f}")    # 1.0000

# For bulk comparisons, normalize first (cosine = dot product of unit vectors):
def normalize(v: np.ndarray) -> np.ndarray:
    return v / np.linalg.norm(v)

king_n, queen_n = normalize(king), normalize(queen)
# Now: similarity = np.dot(king_n, queen_n)

The famous king − man + woman = queen

A celebrated property of well-trained embeddings (first shown in Word2Vec, 2013) is that semantic relationships correspond to arithmetic in vector space:

\vec{\text{king}} - \vec{\text{man}} + \vec{\text{woman}} \approx \vec{\text{queen}}

# Semantic arithmetic in embedding space
man    = embed("man")
woman  = embed("woman")

# Compute the "analogy" vector
target = king - man + woman

# Find which word's embedding is closest to the result
candidates = {"king": king, "queen": queen, "woman": woman, "banana": banana}

similarities = {
    word: cosine_similarity(target, vec)
    for word, vec in candidates.items()
}

best = max(similarities, key=similarities.get)
print("Nearest to (king - man + woman):", best)  # queen ✓

# Output: queen (similarity ~0.89)
# This works because the gender direction (man→woman)
# is consistent across semantic spaces

Modern large embedding models (like text-embedding-3-large) capture far richer, contextual semantics. The same word 'bank' gets completely different embeddings depending on whether the surrounding context is about finance or rivers — because modern embedders process context, not just isolated words.

Building a semantic search system

Here's a minimal but production-realistic semantic search implementation — the same core logic used in LumiChats Study Mode:

import numpy as np
from openai import OpenAI

client = OpenAI()

def embed_batch(texts: list[str]) -> np.ndarray:
    """Embed multiple texts in one API call (efficient)."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return np.array([r.embedding for r in response.data])


# --- 1. Index your knowledge base ---
documents = [
    "Photosynthesis converts sunlight into glucose in plant cells.",
    "The mitochondria is the powerhouse of the cell.",
    "DNA replication occurs during the S phase of the cell cycle.",
    "The Eiffel Tower was built in 1889 in Paris, France.",
]

doc_embeddings = embed_batch(documents)   # shape: (4, 1536)

# Normalize for fast cosine similarity via dot product
norms = np.linalg.norm(doc_embeddings, axis=1, keepdims=True)
doc_embeddings_norm = doc_embeddings / norms


# --- 2. Query ---
def search(query: str, top_k: int = 3):
    query_emb = embed_batch([query])[0]                  # (1536,)
    query_emb = query_emb / np.linalg.norm(query_emb)   # normalize

    # Cosine similarity = dot product when both are normalized
    scores = doc_embeddings_norm @ query_emb             # (4,)

    top_indices = np.argsort(scores)[::-1][:top_k]
    return [(documents[i], float(scores[i])) for i in top_indices]


results = search("how do plants make food?")
for doc, score in results:
    print(f"Score {score:.3f}: {doc}")

# Score 0.847: Photosynthesis converts sunlight into glucose in plant cells. ✓
# Score 0.432: The mitochondria is the powerhouse of the cell.
# Score 0.381: DNA replication occurs during the S phase of the cell cycle.

Production tip: In production, store embeddings in a vector database (pgvector, Pinecone, Qdrant) instead of NumPy arrays — they handle millions of vectors with millisecond search using HNSW indexing.

Embedding model comparison (2025)

Model	Dimensions	MTEB Score	Best for
text-embedding-3-large (OpenAI)	3,072	64.6	General purpose, highest quality
text-embedding-3-small (OpenAI)	1,536	62.3	Cost-efficient, fast
Cohere Embed v3	1,024	64.5	Multilingual, strong retrieval
voyage-3 (Voyage AI)	1,024	67.1	Code, technical retrieval
BGE-M3 (open-source)	1,024	63.5	Self-hosted, multilingual
mxbai-embed-large (open-source)	1,024	64.7	Self-hosted, cost-free

Dimension mismatch: You must embed queries and documents with the same model. Mixing models produces meaningless comparisons — the vector spaces are completely incompatible even at the same dimension.

Use cases beyond RAG

Recommendation systems — embed user history and items; find items closest to the user's "taste vector"
Duplicate detection — find near-identical documents in a corpus (cosine similarity > 0.97)
Classification — train a lightweight classifier (logistic regression, SVM) on top of frozen embeddings — often beats fine-tuning for small datasets
Clustering — K-Means or HDBSCAN over embeddings to discover semantic groups without labels
Cross-lingual search — multilingual models embed English and Hindi into the same space; search Hindi docs with an English query
Anomaly detection — inputs far from the distribution of "normal" embeddings may indicate unusual or adversarial inputs

Practice questions

What is the dot product between two unit vectors and why does cosine similarity use it? (Answer: Cosine similarity = (A·B)/(||A||·||B||). For unit vectors (||A||=||B||=1): cosine_similarity = A·B directly. Range: -1 (opposite directions) to +1 (same direction), 0 (orthogonal/unrelated). Used for embeddings because length is normalized — similarity reflects angular relationship (semantic closeness) not vector magnitude. In practice: normalize embeddings before cosine comparison; this also makes nearest neighbor search faster (dot product only).)
What is the difference between sentence embeddings and word embeddings? (Answer: Word embeddings (Word2Vec, GloVe): one vector per word in the vocabulary, context-independent. 'bank' has the same vector whether 'river bank' or 'bank account.' Sentence embeddings (SBERT, OpenAI text-embedding): one vector per sentence/passage, context-aware. The vector for 'I went to the bank to deposit money' reflects the financial sense. Sentence embeddings are generated by averaging token embeddings or using [CLS] token from a transformer — capturing the full contextual meaning of the input.)
What is semantic search vs keyword search and when is each appropriate? (Answer: Keyword search (BM25/TF-IDF): finds documents containing the exact query terms. Fast, interpretable, handles technical terms and product IDs exactly. Fails when query uses different vocabulary than documents. Semantic search (embedding similarity): finds semantically similar documents even with different vocabulary — 'affordable car' matches documents about 'budget vehicle' and 'cheap automobile.' Slower (requires embedding + ANN lookup). Use keyword for: exact product IDs, medical codes, legal citations. Use semantic for: user intent queries, cross-lingual retrieval, FAQ matching.)
What is the 'curse of dimensionality' problem for high-dimensional embeddings? (Answer: In high dimensions (768, 1536, 3072), the volume grows exponentially — almost all points become roughly equidistant from each other. Nearest neighbor search loses discriminative power: the difference between the closest and farthest neighbor becomes proportionally small. Practical effect: at very high dimensions, cosine similarities cluster around 0 for all pairs. Mitigations: dimensionality reduction (PCA to 256 dims), Matryoshka embeddings (encode most information in first 256 dims), and ANN algorithms (HNSW) that navigate the manifold structure rather than brute-force comparing all distances.)
What is fine-tuning embedding models and when is it necessary? (Answer: Pretrained embeddings (text-embedding-3-large, BAAI/bge): trained on general web text. May not capture domain-specific similarity. Fine-tuning: train on (query, positive_document, negative_document) triplets from your domain using contrastive loss. When necessary: (1) Highly specialized vocabulary (legal, medical, chemical). (2) Custom similarity notion (you want 'similar' to mean something specific). (3) When out-of-box retrieval quality is below 70% accuracy. Fine-tuned embedding models often improve RAG retrieval by 10–20 percentage points on domain-specific tasks.)

LumiChats uses text-embedding-3-large (OpenAI's best embedding model, 3072 dimensions) for Study Mode and Memory. Document chunks and memories are stored as embeddings in pgvector and retrieved using cosine similarity search.

Definition

What an embedding looks like

Generating a real embedding with OpenAI text-embedding-3-small

from openai import OpenAI
import numpy as np

client = OpenAI()  # uses OPENAI_API_KEY env variable

def embed(text: str) -> np.ndarray:
    response = client.embeddings.create(
        model="text-embedding-3-small",  # 1536 dimensions
        input=text
    )
    return np.array(response.data[0].embedding)

king   = embed("king")
queen  = embed("queen")
banana = embed("banana")

print(f"Embedding shape: {king.shape}")    # (1536,)
print(f"First 5 values:  {king[:5]}")
# e.g. [ 0.021, -0.083,  0.045, -0.012,  0.067, ...]
# 1536 numbers — each individually meaningless,
# but together they encode the word's meaning

Shape matters

These 1536 numbers have no individually interpretable meaning. What matters is the geometric relationship: words with similar meanings live close together in this 1536-dimensional space.

Cosine similarity: measuring meaning distance

Cosine similarity: ranges from −1 (opposite meaning) to 1 (identical meaning). 0 = unrelated.

Computing cosine similarity between embeddings

import numpy as np

def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """Cosine similarity between two vectors."""
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# Using the embeddings from the previous example:
print(f"king  ↔ queen:  {cosine_similarity(king, queen):.4f}")   # ~0.87
print(f"king  ↔ banana: {cosine_similarity(king, banana):.4f}")  # ~0.21
print(f"king  ↔ king:   {cosine_similarity(king, king):.4f}")    # 1.0000

# For bulk comparisons, normalize first (cosine = dot product of unit vectors):
def normalize(v: np.ndarray) -> np.ndarray:
    return v / np.linalg.norm(v)

king_n, queen_n = normalize(king), normalize(queen)
# Now: similarity = np.dot(king_n, queen_n)

The famous king − man + woman = queen

A celebrated property of well-trained embeddings (first shown in Word2Vec, 2013) is that semantic relationships correspond to arithmetic in vector space:

The gender relationship is encoded as a consistent geometric direction in embedding space.

Demonstrating word arithmetic with embeddings

# Semantic arithmetic in embedding space
man    = embed("man")
woman  = embed("woman")

# Compute the "analogy" vector
target = king - man + woman

# Find which word's embedding is closest to the result
candidates = {"king": king, "queen": queen, "woman": woman, "banana": banana}

similarities = {
    word: cosine_similarity(target, vec)
    for word, vec in candidates.items()
}

best = max(similarities, key=similarities.get)
print("Nearest to (king - man + woman):", best)  # queen ✓

# Output: queen (similarity ~0.89)
# This works because the gender direction (man→woman)
# is consistent across semantic spaces

Building a semantic search system

Here's a minimal but production-realistic semantic search implementation — the same core logic used in LumiChats Study Mode:

Minimal semantic search with embeddings + cosine similarity

import numpy as np
from openai import OpenAI

client = OpenAI()

def embed_batch(texts: list[str]) -> np.ndarray:
    """Embed multiple texts in one API call (efficient)."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return np.array([r.embedding for r in response.data])


# --- 1. Index your knowledge base ---
documents = [
    "Photosynthesis converts sunlight into glucose in plant cells.",
    "The mitochondria is the powerhouse of the cell.",
    "DNA replication occurs during the S phase of the cell cycle.",
    "The Eiffel Tower was built in 1889 in Paris, France.",
]

doc_embeddings = embed_batch(documents)   # shape: (4, 1536)

# Normalize for fast cosine similarity via dot product
norms = np.linalg.norm(doc_embeddings, axis=1, keepdims=True)
doc_embeddings_norm = doc_embeddings / norms


# --- 2. Query ---
def search(query: str, top_k: int = 3):
    query_emb = embed_batch([query])[0]                  # (1536,)
    query_emb = query_emb / np.linalg.norm(query_emb)   # normalize

    # Cosine similarity = dot product when both are normalized
    scores = doc_embeddings_norm @ query_emb             # (4,)

    top_indices = np.argsort(scores)[::-1][:top_k]
    return [(documents[i], float(scores[i])) for i in top_indices]


results = search("how do plants make food?")
for doc, score in results:
    print(f"Score {score:.3f}: {doc}")

# Score 0.847: Photosynthesis converts sunlight into glucose in plant cells. ✓
# Score 0.432: The mitochondria is the powerhouse of the cell.
# Score 0.381: DNA replication occurs during the S phase of the cell cycle.

Production tip

In production, store embeddings in a vector database (pgvector, Pinecone, Qdrant) instead of NumPy arrays — they handle millions of vectors with millisecond search using HNSW indexing.

Embedding model comparison (2025)

Model	Dimensions	MTEB Score	Best for
text-embedding-3-large (OpenAI)	3,072	64.6	General purpose, highest quality
text-embedding-3-small (OpenAI)	1,536	62.3	Cost-efficient, fast
Cohere Embed v3	1,024	64.5	Multilingual, strong retrieval
voyage-3 (Voyage AI)	1,024	67.1	Code, technical retrieval
BGE-M3 (open-source)	1,024	63.5	Self-hosted, multilingual
mxbai-embed-large (open-source)	1,024	64.7	Self-hosted, cost-free

Dimension mismatch

You must embed queries and documents with the same model. Mixing models produces meaningless comparisons — the vector spaces are completely incompatible even at the same dimension.

Use cases beyond RAG

Recommendation systems — embed user history and items; find items closest to the user's "taste vector"
Duplicate detection — find near-identical documents in a corpus (cosine similarity > 0.97)
Classification — train a lightweight classifier (logistic regression, SVM) on top of frozen embeddings — often beats fine-tuning for small datasets
Clustering — K-Means or HDBSCAN over embeddings to discover semantic groups without labels
Cross-lingual search — multilingual models embed English and Hindi into the same space; search Hindi docs with an English query
Anomaly detection — inputs far from the distribution of "normal" embeddings may indicate unusual or adversarial inputs

Practice questions

What is the dot product between two unit vectors and why does cosine similarity use it? (Answer: Cosine similarity = (A·B)/(||A||·||B||). For unit vectors (||A||=||B||=1): cosine_similarity = A·B directly. Range: -1 (opposite directions) to +1 (same direction), 0 (orthogonal/unrelated). Used for embeddings because length is normalized — similarity reflects angular relationship (semantic closeness) not vector magnitude. In practice: normalize embeddings before cosine comparison; this also makes nearest neighbor search faster (dot product only).)
What is the difference between sentence embeddings and word embeddings? (Answer: Word embeddings (Word2Vec, GloVe): one vector per word in the vocabulary, context-independent. 'bank' has the same vector whether 'river bank' or 'bank account.' Sentence embeddings (SBERT, OpenAI text-embedding): one vector per sentence/passage, context-aware. The vector for 'I went to the bank to deposit money' reflects the financial sense. Sentence embeddings are generated by averaging token embeddings or using [CLS] token from a transformer — capturing the full contextual meaning of the input.)
What is semantic search vs keyword search and when is each appropriate? (Answer: Keyword search (BM25/TF-IDF): finds documents containing the exact query terms. Fast, interpretable, handles technical terms and product IDs exactly. Fails when query uses different vocabulary than documents. Semantic search (embedding similarity): finds semantically similar documents even with different vocabulary — 'affordable car' matches documents about 'budget vehicle' and 'cheap automobile.' Slower (requires embedding + ANN lookup). Use keyword for: exact product IDs, medical codes, legal citations. Use semantic for: user intent queries, cross-lingual retrieval, FAQ matching.)
What is the 'curse of dimensionality' problem for high-dimensional embeddings? (Answer: In high dimensions (768, 1536, 3072), the volume grows exponentially — almost all points become roughly equidistant from each other. Nearest neighbor search loses discriminative power: the difference between the closest and farthest neighbor becomes proportionally small. Practical effect: at very high dimensions, cosine similarities cluster around 0 for all pairs. Mitigations: dimensionality reduction (PCA to 256 dims), Matryoshka embeddings (encode most information in first 256 dims), and ANN algorithms (HNSW) that navigate the manifold structure rather than brute-force comparing all distances.)
What is fine-tuning embedding models and when is it necessary? (Answer: Pretrained embeddings (text-embedding-3-large, BAAI/bge): trained on general web text. May not capture domain-specific similarity. Fine-tuning: train on (query, positive_document, negative_document) triplets from your domain using contrastive loss. When necessary: (1) Highly specialized vocabulary (legal, medical, chemical). (2) Custom similarity notion (you want 'similar' to mean something specific). (3) When out-of-box retrieval quality is below 70% accuracy. Fine-tuned embedding models often improve RAG retrieval by 10–20 percentage points on domain-specific tasks.)

On LumiChats

Try it free

Embeddings

What an embedding looks like

Cosine similarity: measuring meaning distance

The famous king − man + woman = queen

Building a semantic search system

Embedding model comparison (2025)

Use cases beyond RAG

Practice questions

Embeddings

What an embedding looks like

Cosine similarity: measuring meaning distance

The famous king − man + woman = queen

Building a semantic search system

Embedding model comparison (2025)

Use cases beyond RAG

Practice questions

Practice what you just learned

Related Terms