Embeddings Explained: How LLMs Turn Words Into Meaning

What are embeddings in AI? A plain-English guide to how large language models represent words and meaning as vectors — why 'king − man + woman ≈ queen' works, what vector similarity is, and why embeddings power search and RAG.

In the last chapter we turned text into tokens, then into ID numbers like 4923. But an ID is just a label — 4923 is no more "meaningful" than 4924. So how does a model go from arbitrary IDs to something that captures that king and queen are related, or that Paris and France belong together?

The answer is embeddings, and it's one of the most elegant ideas in modern AI.

Turning meaning into numbers

An embedding is a list of numbers — a vector — assigned to each token. Instead of dog → 4923, the model learns dog → [0.21, -0.83, 0.04, ... ], often with hundreds or thousands of numbers in the list.

The magic is how those numbers are chosen. During training, the model adjusts them so that tokens with similar meanings end up with similar vectors. Meaning stops being a label and becomes a position in space.

Meaning becomes distance

Picture a vast space where every word is a point. Words that mean similar things cluster together; unrelated words sit far apart.

dogpuppycat·····spreadsheetinvoice

Related words cluster; unrelated words are far away in embedding space

In this space, dog sits near puppy and cat, while spreadsheet and invoice form their own cluster far away. The distance between two vectors becomes a measure of how related two ideas are. This is what people mean by semantic similarity — similarity of meaning, not of spelling.

This is genuinely different from old keyword matching. "How do I fix my car?" and "automobile repair guide" share almost no words but sit close in embedding space, because they mean nearly the same thing.

The trick that proved it works

The famous demonstration that embeddings capture real structure is vector arithmetic:

king − man + woman ≈ queen

Take the vector for king, subtract the direction that represents "maleness," add the direction for "femaleness," and you land near queen. Relationships like gender, plurality, or capital-of are encoded as consistent directions in the space.

king

− man

+ woman

≈ queen

Relationships are directions: the same move maps man→woman and king→queen

That this works at all is the proof: the model isn't storing words, it's storing a structured map of meaning.

Where embeddings sit in an LLM

Embeddings are the model's first real step of understanding. Here's the pipeline so far:

Step	Input	Output
Tokenize	Raw text	Tokens
Map to IDs	Tokens	Token IDs
Embed	Token IDs	Meaning vectors
Transformer layers	Meaning vectors	Context-aware meaning

Without embeddings, the Transformer would just be shuffling arbitrary IDs. Embeddings give it something with structure to reason over — which is exactly what the attention mechanism and the rest of the Transformer need.

Context changes meaning

There's an important subtlety. A token's starting embedding is the same every time — but words are slippery. Bank by a river and bank that holds your money are spelled identically.

So the initial embedding is best thought of as a draft meaning. As the text flows through the model, attention adjusts each token's vector based on its neighbours, so bank near river drifts toward one meaning and bank near loan drifts toward another. The static embedding is the starting point; context sculpts it into the final, situation-specific meaning.

Why you'll meet embeddings everywhere

Embeddings aren't just an internal detail — they're a tool you'll use directly:

Semantic search ranks documents by embedding similarity to your query, matching meaning over keywords.
Retrieval-Augmented Generation (RAG) embeds your documents, stores the vectors in a database, and retrieves the closest ones to feed the model. We touch on this in the prompt engineering guide's chapter on RAG prompts.
Recommendations and clustering group items by how close their embeddings are.

So when you hear "vector database," "semantic search," or "embedding model," they're all built on the idea in this chapter: meaning as a point in space.

Recap

An embedding is a vector — a list of numbers — that represents a token's meaning.
Training arranges them so similar meanings get similar vectors; meaning becomes distance.
Vector arithmetic like king − man + woman ≈ queen shows relationships are encoded as directions.
Embeddings are the model's first step of understanding, turning arbitrary IDs into structured meaning.
The starting embedding is a draft; context (via attention) reshapes it for each situation.
Search and RAG run on embedding similarity, matching meaning rather than keywords.

We now have meaning vectors flowing into the model. Next we meet the architecture that processes them — the breakthrough behind every modern LLM. Continue to Transformers: the architecture behind every LLM.