Load balancing strategies
Round-robin, least connections, IP hash, consistent hash, EWMA. Pick by traffic pattern: uniform requests use round-robin; variable work uses least-loaded.
A load balancer (LB) sits in front of N backend servers and distributes requests across them. The strategy decides which backend gets the next request.
The main strategies
- Round-robin: backend 1, then 2, then 3, then 1. Simple, fair if requests are uniform.
- Weighted round-robin: heavier weight to bigger servers.
- Least connections: send to the backend with the fewest active connections. Adapts to slow requests.
- Least response time: send to the backend with lowest p50 latency. Adapts to backend health.
- IP hash: hash client IP, route to same backend every time. Sticky sessions.
- Consistent hash: hash a key (URL, session ID), route to a stable subset. Survives backend add/remove with minimal reshuffling.
- Power of 2 choices: pick 2 random backends, choose the less loaded. Cheap approximation of least-loaded.
- EWMA (exponentially weighted moving average): track each backend's recent latency, prefer fastest. Envoy's default.
When to use which
- Uniform CPU-bound work: round-robin.
- Variable response times (some queries are 10x slower): least connections or power of 2 choices.
- Cache locality matters (CDN, sharded cache): consistent hash on URL or key.
- WebSocket or session state in memory: cookie or IP hash for stickiness.
- Latency-critical microservices: EWMA.
L4 vs L7
- L4 LB: routes by TCP/UDP 5-tuple. Fast (kernel can do it), protocol-agnostic. AWS NLB, HAProxy in TCP mode.
- L7 LB: parses HTTP, routes by host, path, header, cookie. Slower but smarter. Nginx, Envoy, AWS ALB.
Most modern systems use both: L4 in front for raw throughput, L7 behind for routing logic.
Health checks
LB pings each backend periodically. Unhealthy backends are removed from the pool. Two kinds:
- Active: LB sends test requests. Quick to detect failures, costs traffic.
- Passive: LB tracks real request success/failure. Free, slower to detect.
Most production LBs use both: passive for normal operation, active for confirmation.
Numbers to memorize
- Round-robin distribution variance: about 10% under random workload.
- Power of 2 choices reduces tail by 2-5x over random.
- Consistent hash reshuffle on N-to-N+1 backend change: 1/(N+1) of keys, not all.
- Typical health check interval: 5-30 seconds.
Learn more
- DocsHAProxy documentationHAProxy
- DocsNginx load balancingNginx
- Docs