Vector Databases: Storing and Searching Embeddings

What is a vector database? A plain-English guide to how vector DBs store embeddings and find nearest neighbors fast with approximate search (ANN), why brute force doesn't scale, and metadata filtering.

We now have a pile of meaningful chunks, each turned into a vector. For a small demo you could keep them in a list and compare one by one. But real systems have millions of chunks, and a question needs an answer in milliseconds. Vector databases exist to make that search fast — they're the storage and search engine at the heart of RAG.

The one question a vector DB answers

A vector database is specialized around a single operation: given a query vector, find the stored vectors closest to it. That's the nearest-neighbor search from the embeddings chapter, done at scale. Everything a vector DB offers is in service of answering that question quickly and at volume.

Query vector

Search the index

Top-k nearest vectors

Return their chunks

A vector DB's core job: nearest-neighbor search over stored embeddings

Why brute force doesn't scale

The obvious approach is to compare the query against every stored vector and keep the closest. This is exact but linear — twice the data, twice the work — and it collapses at scale.

10K vectors

fast

1M vectors

slow

100M vectors

unusable

Brute-force search time grows with the number of vectors

A user won't wait while you scan a hundred million vectors for every query. You need a structure that finds the near-neighbors without looking at everything — an index.

Approximate nearest neighbor (ANN)

The breakthrough that makes vector search practical is giving up a little accuracy for a lot of speed. Approximate nearest neighbor (ANN) algorithms organize vectors so a query can jump to the right neighborhood and check only a small fraction of the data.

You don't need the internals of specific index types to use a vector DB well. The mental model is enough: the index is a clever map that lets search skip most of the data while still landing on the right region. The dial you control is accuracy versus speed — push for higher recall and queries slow down; relax it and they speed up.

Metadata filtering: meaning plus rules

Pure similarity isn't always what you want. Often you need "the most relevant chunk that also belongs to this user," or "from the last 30 days." Vector databases handle this by storing metadata next to each vector — fields like source, author, date, or permissions.

Stored with each chunk	Used for
The embedding	Similarity search
Source / document ID	Citations, grounding
Date	"Only recent" filters
User / tenant ID	Restricting to allowed data

This lets you combine semantic search with hard constraints — find what's relevant by meaning, but only within the slice of data this query is allowed to see. That filtering is essential for both correctness and security: it's how one customer's question never retrieves another customer's documents.

Top-k: retrieve a few, not one

A subtle but important habit: you don't fetch a single nearest chunk, you fetch the top-k — the k closest matches (often a handful). Why several? Because the single best chunk might lack a detail that a neighbor supplies, and giving the model a few relevant passages produces better answers than betting everything on one.

Choosing k is a balance: too few and you risk missing the needed context; too many and you flood the context window with marginal text and raise cost. We'll return to how those top-k results are used — and improved — in the next two chapters.

You don't always need a separate product

"Vector database" sounds like a whole new piece of infrastructure to adopt, and dedicated ones exist. But the capability is spreading: many general-purpose databases now store and search vectors as just another column type. For a modest corpus, that may be all you need; dedicated vector DBs earn their keep at large scale or with demanding query patterns. Focus on the capability — fast nearest-neighbor search with metadata filtering — not on picking a brand.

Recap

A vector database is built around one operation: find the stored vectors closest to a query vector.
Brute-force comparison is exact but scales linearly and breaks down at millions of vectors.
Approximate nearest neighbor (ANN) search trades a little accuracy for huge speed via an index.
Metadata filtering combines semantic search with hard rules — recency, permissions, source.
Retrieve the top-k nearest chunks, and remember many regular databases can now do this too.

We can store and search millions of chunks fast. Now the active question on every query: how do we actually pull the right context together? That's retrieval. Continue to Retrieval in RAG.