Joins & Query Planning: How SQL Actually Runs

A deep, plain-English look at joins and the query planner — what a join really computes, the join algorithms (nested loop, hash, merge), why SQL is declarative, and how to read an EXPLAIN plan to find slow queries.

Normalized tables split data apart to keep it clean: users here, orders there, linked by keys. The join is how you put it back together at query time. And because a join can be computed several ways, this chapter is also where we meet the database's quiet genius: the query planner.

What a join actually computes

A join combines rows from two tables wherever a condition holds. The everyday case is matching a foreign key to a primary key — attach each order to its user:

SELECT users.display_name, orders.total
FROM orders
JOIN users ON orders.user_id = users.id
WHERE orders.status = 'shipped';

Conceptually the database considers pairs of rows (one from each table) and keeps those where orders.user_id = users.id. The result is a single wide row per match — the user's name sitting next to the order's total.

The type of join decides what happens to rows with no match:

INNER JOIN — keep only matched pairs (orders that have a user).
LEFT JOIN — keep every left row; fill unmatched right columns with NULL (every user, even those with no orders).
RIGHT / FULL — the same idea, mirrored or both-sided.

SQL is declarative — and that's the point

Notice what the query didn't say: which table to read first, whether to use an index, how to match the rows. SQL is declarative — you describe the result, and the database's query planner (optimizer) works out the procedure. For anything non-trivial the planner, armed with statistics about your data, routinely beats a hand-written plan. Your job shifts from "how do I compute this" to "have I given the planner what it needs to choose well" — mainly the right indexes.

The three join algorithms

The same JOIN can execute three different ways. Knowing them turns EXPLAIN output from noise into signal.

Nested loop join — for each row in the outer table, look up matches in the inner table. If the inner side has an index on the join column, each lookup is a fast B-tree probe. Excellent when one side is small or the inner join column is indexed; terrible when it degrades into a scan-inside-a-scan.

Hash join — build a hash table of one side's join keys in memory, then stream the other side through it, probing for matches. Great for joining two large, unindexed tables — roughly linear instead of quadratic.

Merge join — if both inputs are already sorted on the join key (say, from indexes), walk them in lockstep like merging two sorted lists. Very efficient at scale when the sort comes for free.

Parse SQL

Estimate row counts

Compare join plans

Run cheapest

The planner picks a join method from cost estimates

The planner estimates the cost of each and picks the cheapest, using statistics it keeps about table sizes and value distributions. Stale statistics are a real cause of bad plans — which is why databases periodically re-analyze tables.

Reading the plan: EXPLAIN

You never have to guess what the planner chose. EXPLAIN prints the plan; EXPLAIN ANALYZE actually runs it and reports real timings.

EXPLAIN ANALYZE
SELECT users.display_name, orders.total
FROM orders JOIN users ON orders.user_id = users.id
WHERE orders.status = 'shipped';

You're reading it for a few tells:

Seq Scan / Full Table Scan on a big table where you expected an index lookup — usually a missing or unusable index.
Estimated vs actual rows wildly diverging — stale statistics misleading the planner.
The join method chosen, and whether it matches your mental model.

The most common slow-join story

It's almost always the same shape. A join or WHERE column isn't indexed, so the planner can't do fast lookups and falls back to scanning — a nested loop turns into a scan inside a scan, and the query time explodes as tables grow. EXPLAIN shows the sequential scan; you add the index; the plan flips to an index lookup or hash join; the query drops from seconds to milliseconds. This single pattern accounts for a large share of real-world "the database is slow" incidents.

Recap

A join matches rows across tables on a condition — typically a foreign key meeting a primary key — reassembling normalized data.
INNER keeps matches; LEFT/RIGHT/FULL preserve unmatched rows with NULLs.
SQL is declarative: the query planner chooses the procedure, and usually beats hand-tuning.
Joins run as nested loop, hash, or merge — the planner picks by cost estimates and statistics.
EXPLAIN (and EXPLAIN ANALYZE) reveals the real plan — always your first move on a slow query.
Most slow joins are a missing index forcing a scan; the plan makes it obvious.

We can shape data, index it, and join it correctly. Now the guarantee that a group of these operations happens safely — all or nothing. Continue to Transactions & ACID.