Embeddings map text (or images) into dense vectors so that semantically similar items sit close together in space. They are the backbone of search and retrieval-augmented generation (RAG).
What an embedding is
A model maps input to a fixed-size float vector — for example, 768 or 1536 dimensions. Similar meanings → smaller distance between vectors.
Common distance metrics:
- Cosine similarity — direction similarity; popular for text.
- Dot product — related; often equivalent if vectors are normalized.
You rarely compare raw strings character by character in RAG; you compare embeddings.
Where embeddings are used
- Semantic search — "Find docs like this question" across a knowledge base.
- Clustering and deduplication — group support tickets or near-duplicate articles.
- RAG — retrieve the most relevant chunks, then pass them to the LLM as context.
Choosing a model
Trade-offs include:
- Domain fit — code vs legal vs support chat.
- Dimensionality — higher can mean better quality but more storage and compute.
- Multilingual needs — pick models trained for your languages.
- Latency and cost — hosted APIs vs self-hosted open weights.
Start with one strong general model; split only when metrics show a clear win.
Storage and ANN indexes
At scale you do not brute-force compare every vector. Use approximate nearest neighbor indexes (HNSW, IVF, product quantization — depends on your vector DB).
Revisit index parameters when recall drops after data grows.
Key takeaways
- Embeddings enable similarity search that keyword search misses.
- Pick embedding models with your domain and languages in mind.
- Plan for approximate search and monitoring as corpus size grows.