Chapter 43·Advanced·9 min read
Caching at Scale: Distributed Caches and Their Pitfalls
A plain-English guide to caching at scale — the layers of caching in a system, distributed in-memory caches like Redis, cache hit ratio, and the failure modes: stampedes, thundering herds, and cold caches.
June 30, 2026
A CDN caches static content at the edge. But the dynamic data a CDN can't touch — user feeds, computed results, hot database rows — also gets requested relentlessly, and hitting the database every time doesn't scale. Caching at scale is about a fast, shared cache layer that absorbs those reads, and the subtle failure modes that bite when you run one under heavy load.
Caching happens in layers
First, widen the lens. In a real system, caching isn't one thing — it happens at every layer, and a request ideally gets answered by the fastest layer that can:
| Layer | Caches | Speed |
|---|---|---|
| Browser | Local copies of assets | Instant |
| CDN | Static content at the edge | Very fast |
| Application cache | Dynamic data, computed results | Fast |
| Database cache | Recent queries, hot pages | Moderate |
The art is to answer each request as high up this stack as possible, so it never reaches the slow, expensive layers. This chapter focuses on the application cache — the shared in-memory layer in front of your database.
The distributed cache
With one server, an in-memory cache can just live inside it. But scaling out breaks that: ten stateless servers each with their own private cache means ten copies, inconsistent and mostly empty. The fix is a distributed cache — a separate, shared in-memory store (Redis being the classic example) that all your servers talk to.
Now every server sees the same cached data, the cache survives any single server restarting, and it scales independently. This shared, in-memory layer is one of the most important components in a high-traffic system — it's frequently what stands between your database and collapse.
Hit ratio: the number that matters
The whole value of a cache is captured in one metric: the hit ratio — the fraction of requests served from cache rather than the slow store behind it.
The relationship is dramatic: going from a 90% to a 99% hit ratio cuts database load by tenfold (from 10% of requests to 1%). This is why so much caching effort goes into pushing the hit ratio up — caching the right things, with the right TTLs, so the database sees as little traffic as possible. A cache with a poor hit ratio is just overhead.
Failure mode: the cache stampede
Now the pitfalls, because caching at scale fails in specific, nasty ways. The first is the stampede (or "thundering herd"):
The fixes involve not letting everyone recompute at once — for example, having just one request rebuild the value while others briefly wait or serve slightly-stale data. The key realization is that cache expiry is a load event, and for hot keys it must be managed, not left to chance.
Failure mode: the cold cache
The second pitfall is the cold cache. A cache only protects you when it's full of useful data. Right after a restart, a deploy, or a flush, it's empty — and every single request misses and falls through to the database.
Both failure modes share a lesson: a cache changes your database's load profile, and any moment the cache stops absorbing traffic — expiry or cold start — that hidden load lands on the database at once. Designing for those moments is what separates a cache that helps from one that causes the very outage it was meant to prevent.
Recap
- At scale, a distributed in-memory cache (e.g. Redis) shared by all servers absorbs reads from the database.
- Caching happens in layers (browser, CDN, app, database) — answer each request as high up the stack as possible.
- Hit ratio decides the payoff: 90% to 99% cuts database load tenfold; a low hit ratio is just overhead.
- Stampedes happen when a hot key expires and many requests miss at once, slamming the database.
- Cold caches offer no protection — after a restart, all traffic falls through; warm and ramp gradually.
Caching protects reads, but your database still has to own the data. To scale the database itself — and survive its failure — you start with replication. Continue to Database Replication.