Skip to main content

Embeddings and Vector Search

Embeddings map text (or images) into dense vectors so that semantically similar items sit close together in space. They are the backbone of search and retrieval-augmented generation (RAG).

What an embedding is

A model maps input to a fixed-size float vector — for example, 768 or 1536 dimensions. Similar meanings → smaller distance between vectors.

Common distance metrics:

  • Cosine similarity — direction similarity; popular for text.
  • Dot product — related; often equivalent if vectors are normalized.

You rarely compare raw strings character by character in RAG; you compare embeddings.

Where embeddings are used

  • Semantic search — "Find docs like this question" across a knowledge base.
  • Clustering and deduplication — group support tickets or near-duplicate articles.
  • RAG — retrieve the most relevant chunks, then pass them to the LLM as context.

Choosing a model

Trade-offs include:

  • Domain fit — code vs legal vs support chat.
  • Dimensionality — higher can mean better quality but more storage and compute.
  • Multilingual needs — pick models trained for your languages.
  • Latency and cost — hosted APIs vs self-hosted open weights.

Start with one strong general model; split only when metrics show a clear win.

Storage and ANN indexes

At scale you do not brute-force compare every vector. Use approximate nearest neighbor indexes (HNSW, IVF, product quantization — depends on your vector DB).

Revisit index parameters when recall drops after data grows.

Key takeaways

  • Embeddings enable similarity search that keyword search misses.
  • Pick embedding models with your domain and languages in mind.
  • Plan for approximate search and monitoring as corpus size grows.