Embeddings for RAG: Searching Text by Meaning

How embeddings power RAG — turning text into vectors that capture meaning, so search finds relevant passages even with no shared keywords. A plain-English guide to semantic search and similarity.

In the last chapter we said RAG's hard part is finding the right text. But "find relevant text" hides a deep problem: relevance is about meaning, and computers compare strings, not ideas. Search "how do I get a refund" and the right document might say "returns and reimbursements" — not one shared keyword. Embeddings are how RAG bridges that gap.

Keyword search misses meaning

Classic search matches words. It's fast and exact, and it fails the moment the question and the answer use different vocabulary:

Query	Relevant doc says	Keyword match?
"car insurance"	"auto policy coverage"	No
"reset my password"	"recover account access"	No
"is it waterproof?"	"rated IP68 for water resistance"	No

Each pair means the same thing, yet shares almost no words. To retrieve well, RAG needs to search by meaning, and that requires turning meaning into something a computer can measure.

An embedding is meaning as coordinates

An embedding is a function that maps a piece of text to a list of numbers — a vector — positioned in a high-dimensional space so that texts with similar meaning land near each other.

You met this idea already in the LLM embeddings chapter: the model turns tokens into vectors of meaning. RAG uses the same trick at the level of whole passages. "Refund" and "reimbursement" end up as nearby points; "refund" and "rainfall" end up far apart.

You can't picture a 1,000-dimensional map, and you don't need to. The only property that matters is the one that's true in any number of dimensions: near means similar, far means different.

Similarity: searching by distance

Once text is points in space, search becomes geometry. To find what's relevant to a question, RAG:

Question

Embed into a vector

Compare to stored vectors

Return the closest

Semantic search: embed the query, find the nearest stored vectors

The "compare" step measures distance between vectors — most commonly cosine similarity, which scores how closely two vectors point in the same direction. A higher score means closer meaning.

Returns & refunds

0.91

Shipping times

0.44

Office hours

0.22

0.18

Similarity of stored chunks to the query 'how do I get a refund?'

The refund passage scores highest even though the question never used the word "returns." That's semantic search: relevance by meaning, not by string overlap.

Same model on both sides

One rule trips up newcomers and is worth stating plainly: the query and the documents must be embedded by the same model. Embeddings only live in the same space — and are only comparable — if they were produced by the same embedder. Mix two models and the coordinates mean different things, and distances become noise.

In practice this means you pick one embedding model, use it to index every chunk, and use the same model to embed each incoming query.

More dimensions, more nuance — at a cost

Embeddings aren't three numbers; they're typically hundreds to thousands per chunk. More dimensions give the space more room to separate fine shades of meaning, but they cost more to store and more time to search.

Dimensions	Trade-off
Smaller (e.g. ~384)	Cheaper, faster search, slightly coarser meaning
Larger (e.g. ~1,536+)	Richer nuance, more storage and slower search

For most systems the embedding model's default dimension is fine; the choice becomes a real lever only at large scale, where storage and search speed start to bite.

What embeddings can't do

A crucial limit to carry forward: embeddings measure relevance, not truth. Two passages can sit close in the space while one is outdated or simply wrong. Similarity finds text that's about your question; it can't tell you which text is correct. That's why retrieval is only the first half of RAG — ranking and the model's own generation still have to sort signal from noise.

Recap

Keyword search matches words, so it misses relevant text that uses different vocabulary.
An embedding maps text to a vector positioned so that similar meanings sit close together.
Search becomes distance: embed the query, find the nearest stored vectors (often via cosine similarity).
The query and documents must share one embedding model, or their coordinates aren't comparable.
Embeddings capture meaning, not truth — they find what's relevant, not what's right.

We can now compare a question to stored text. But what is that stored text — whole documents, or pieces? How you split your data, called chunking, quietly decides how well retrieval works. Continue to Chunking for RAG.