Load Balancers: Spreading Traffic Across Servers

A plain-English guide to load balancers — how they distribute traffic across servers, common algorithms, health checks, and why they're the entry point to any horizontally scaled system.

To scale horizontally, you run many servers — but that immediately raises a question: when a request arrives, which server handles it? Users can't be expected to pick. The answer is a load balancer: a component that sits in front of your servers and distributes traffic across them. It's the entry point to essentially every scaled system, so it's the right place to start the practical chapters.

One front door for many servers

A load balancer is a single endpoint that clients connect to, which then forwards each request to one of several backend servers. Clients see one address; behind it, a pool of interchangeable servers shares the work.

Clients

Load balancer

Server 1 / Server 2 / Server 3

Clients hit one address; the balancer spreads work across the pool

This is what makes stateless, horizontally-scaled systems usable. The servers do the work; the load balancer makes them look like a single, much larger service from the outside. Add a server to the pool and it starts receiving traffic; remove one and it stops — all invisible to clients.

Algorithms: how it picks a server

The load balancer needs a strategy for choosing which server gets each request. The common ones:

Algorithm	How it chooses	Good when
Round robin	Each server in turn, evenly	Servers are similar and requests are uniform
Least connections	The server with the fewest active requests	Request durations vary a lot
Weighted	Proportional to each server's capacity	Servers have different power
IP hash	Same client maps to the same server	You need a client pinned to one server

For most systems, round robin or least connections is plenty. The choice matters most when your requests are uneven — if some take far longer than others, round robin can pile slow requests onto one server while least-connections naturally avoids that.

Health checks: routing around failure

A load balancer does more than spread load — it detects failure and routes around it. It continuously sends health checks to each server (a small "are you alive?" probe). If a server stops responding, the balancer takes it out of the pool and sends traffic only to the healthy ones.

When the server recovers and starts passing health checks again, it's quietly returned to the pool. This automatic detect-and-route-around behavior is what lets a system survive individual server failures without human intervention.

High availability for free

Put those pieces together and you get high availability almost as a side effect. Because any request can go to any healthy server, the loss of a single server isn't an outage — its share of traffic simply flows to the others.

Server 2 fails

Health check notices

Removed from pool

Traffic flows to 1 and 3

A failed server is dropped from the pool; the rest carry on

This is a fundamental shift from a single server, where any failure is total downtime. With a load balancer fronting a pool, failures become routine and survivable — exactly the resilience that horizontal scaling promised but couldn't deliver on its own.

Don't create a new single point of failure

There's an obvious objection: if everything goes through the load balancer, isn't it now the single point of failure? Yes — and that's a real concern you must address. If the one load balancer dies, every server behind it becomes unreachable, healthy or not.

The fix is to make the load balancer itself redundant — run more than one, with a mechanism to fail over if one goes down. In practice, managed load balancers from cloud providers handle this redundancy for you. The principle to remember is that any component all traffic flows through must itself be made resilient, or you've just moved your single point of failure rather than removing it.

Recap

A load balancer sits in front of your servers and distributes incoming requests across them.
It's what makes horizontal scaling usable — clients see one address; a pool shares the work.
Algorithms (round robin, least connections, weighted) decide which server handles each request.
Health checks detect failed servers and route around them automatically — the heart of high availability.
The balancer itself must be made redundant, or it becomes the new single point of failure.

A load balancer is one thing that sits in front of your servers. A close relative — the reverse proxy — sits there too, doing a broader set of jobs. Continue to Reverse Proxies.