Caching at Scale: Distributed Caches and Their Pitfalls

A plain-English guide to caching at scale — the layers of caching in a system, distributed in-memory caches like Redis, cache hit ratio, and the failure modes: stampedes, thundering herds, and cold caches.

A CDN caches static content at the edge. But the dynamic data a CDN can't touch — user feeds, computed results, hot database rows — also gets requested relentlessly, and hitting the database every time doesn't scale. Caching at scale is about a fast, shared cache layer that absorbs those reads, and the subtle failure modes that bite when you run one under heavy load.

Caching happens in layers

First, widen the lens. In a real system, caching isn't one thing — it happens at every layer, and a request ideally gets answered by the fastest layer that can:

Layer	Caches	Speed
Browser	Local copies of assets	Instant
CDN	Static content at the edge	Very fast
Application cache	Dynamic data, computed results	Fast
Database cache	Recent queries, hot pages	Moderate

The art is to answer each request as high up this stack as possible, so it never reaches the slow, expensive layers. This chapter focuses on the application cache — the shared in-memory layer in front of your database.

The distributed cache

With one server, an in-memory cache can just live inside it. But scaling out breaks that: ten stateless servers each with their own private cache means ten copies, inconsistent and mostly empty. The fix is a distributed cache — a separate, shared in-memory store (Redis being the classic example) that all your servers talk to.

Any server

Shared cache (Redis)

Hit: return

Miss: hit database, fill cache

A shared cache gives every server one consistent fast layer

Now every server sees the same cached data, the cache survives any single server restarting, and it scales independently. This shared, in-memory layer is one of the most important components in a high-traffic system — it's frequently what stands between your database and collapse.

Hit ratio: the number that matters

The whole value of a cache is captured in one metric: the hit ratio — the fraction of requests served from cache rather than the slow store behind it.

50% hit ratio

50% to DB

90% hit ratio

10% to DB

99% hit ratio

1% to DB

Database load falls sharply as the cache hit ratio rises

The relationship is dramatic: going from a 90% to a 99% hit ratio cuts database load by tenfold (from 10% of requests to 1%). This is why so much caching effort goes into pushing the hit ratio up — caching the right things, with the right TTLs, so the database sees as little traffic as possible. A cache with a poor hit ratio is just overhead.

Failure mode: the cache stampede

Now the pitfalls, because caching at scale fails in specific, nasty ways. The first is the stampede (or "thundering herd"):

The fixes involve not letting everyone recompute at once — for example, having just one request rebuild the value while others briefly wait or serve slightly-stale data. The key realization is that cache expiry is a load event, and for hot keys it must be managed, not left to chance.

Failure mode: the cold cache

The second pitfall is the cold cache. A cache only protects you when it's full of useful data. Right after a restart, a deploy, or a flush, it's empty — and every single request misses and falls through to the database.

Both failure modes share a lesson: a cache changes your database's load profile, and any moment the cache stops absorbing traffic — expiry or cold start — that hidden load lands on the database at once. Designing for those moments is what separates a cache that helps from one that causes the very outage it was meant to prevent.

Recap

At scale, a distributed in-memory cache (e.g. Redis) shared by all servers absorbs reads from the database.
Caching happens in layers (browser, CDN, app, database) — answer each request as high up the stack as possible.
Hit ratio decides the payoff: 90% to 99% cuts database load tenfold; a low hit ratio is just overhead.
Stampedes happen when a hot key expires and many requests miss at once, slamming the database.
Cold caches offer no protection — after a restart, all traffic falls through; warm and ramp gradually.

Caching protects reads, but your database still has to own the data. To scale the database itself — and survive its failure — you start with replication. Continue to Database Replication.