Chapter 39·Intermediate·8 min read
Caching in the Backend: Faster by Not Doing the Work
A plain-English guide to backend caching — why caching speeds up systems, where caches live, the hard problem of invalidation, TTLs, and cache-aside. How to be fast by avoiding repeated work.
June 30, 2026
A backend that hits the database for every request does a lot of repeated work — recomputing the same answer, re-querying the same popular rows, thousands of times a second. Caching is the art of not doing work you've already done: store an expensive result and serve it instantly next time. It's one of the highest-leverage performance tools in backend engineering — and one of the trickiest to get right.
The idea: store the answer
A cache is fast storage that holds the result of expensive work, keyed so you can find it again. The flow is simple:
Caches are fast because they typically live in memory rather than on disk, and because returning a stored value avoids the real cost — a slow query, an external API call, an expensive computation. A cache hit turns hundreds of milliseconds into a fraction of one.
Why caching works at all
Caching only helps because real traffic is repetitive. The same popular products, the same trending posts, the same common queries get requested over and over. A cache exploits this: do the work once, then serve the stored answer to everyone who asks next.
The more skewed your traffic toward popular items (and most traffic is), the more a cache pays off — a small cache of the hottest data can absorb the bulk of requests.
The hard problem: invalidation
There's an old joke that there are only two hard problems in computer science, and one of them is cache invalidation. Here's why it's hard: a cache stores a copy, and copies go stale when the original changes.
So the central question of caching isn't "how do I store data?" — it's "how do I know when cached data is no longer true, and what do I do about it?" There's no universal answer, but two practical tools cover most cases.
TTL: expire on a timer
The simplest approach is a time-to-live (TTL): tag each cached entry with an expiry, and once it passes, the entry is dropped and the next request refetches fresh data. You're not tracking whether the underlying data changed — you're just bounding how stale it can get.
TTL trades precision for simplicity. A 60-second TTL means data can be up to a minute old, but you never have to detect changes. For data that tolerates slight staleness — counts, listings, slowly-changing config — a TTL is often all you need. For data that must be exact the instant it changes, you pair caching with explicit invalidation: when you update the source, you also update or delete the cached copy.
Cache-aside: the common pattern
The most widely used caching pattern is cache-aside (also "lazy loading"), and it's exactly the flow from earlier, made concrete:
The cache stays beside the database, populated on demand. On a miss you do the real work, store the result, and serve it; the next identical request hits. It's popular because it's simple, only caches data that's actually requested, and degrades gracefully — if the cache is unavailable, requests just fall through to the database (slower, but correct).
The trade-off, stated plainly
Every cache is the same bargain: speed now, in exchange for the risk of serving slightly old data. Whether that bargain is acceptable is entirely a product question. A stale follower count for 30 seconds? Fine. A stale bank balance or a stale permission check? Not fine. Cache aggressively where staleness is harmless, and not at all where correctness is non-negotiable.
Recap
- A cache stores the result of expensive work so repeat requests are served instantly.
- It works because traffic is repetitive — a little cached hot data absorbs most requests.
- Invalidation is the hard part: cached copies go stale when the source changes, and fail silently.
- TTLs bound staleness with a timer; explicit invalidation updates the cache when the source changes.
- Cache-aside (check cache, fall back to DB, store) is the common pattern — simple and resilient.
- Caching trades speed for possible staleness — use it where old data is harmless, avoid it where it isn't.
Caching handles repeated reads. But some work is too slow to do during a request at all. The fix is to do it later, in the background — with queues. Continue to Background Jobs and Queues.