In revision.
Crisp5 min readGo deeper →

Load balancing strategies

Round-robin, least connections, IP hash, consistent hash, EWMA. Pick by traffic pattern: uniform requests use round-robin; variable work uses least-loaded.

A load balancer (LB) sits in front of N backend servers and distributes requests across them. The strategy decides which backend gets the next request.

The main strategies

  • Round-robin: backend 1, then 2, then 3, then 1. Simple, fair if requests are uniform.
  • Weighted round-robin: heavier weight to bigger servers.
  • Least connections: send to the backend with the fewest active connections. Adapts to slow requests.
  • Least response time: send to the backend with lowest p50 latency. Adapts to backend health.
  • IP hash: hash client IP, route to same backend every time. Sticky sessions.
  • Consistent hash: hash a key (URL, session ID), route to a stable subset. Survives backend add/remove with minimal reshuffling.
  • Power of 2 choices: pick 2 random backends, choose the less loaded. Cheap approximation of least-loaded.
  • EWMA (exponentially weighted moving average): track each backend's recent latency, prefer fastest. Envoy's default.

When to use which

Picking a load balancing strategy
  • Uniform CPU-bound work: round-robin.
  • Variable response times (some queries are 10x slower): least connections or power of 2 choices.
  • Cache locality matters (CDN, sharded cache): consistent hash on URL or key.
  • WebSocket or session state in memory: cookie or IP hash for stickiness.
  • Latency-critical microservices: EWMA.

L4 vs L7

  • L4 LB: routes by TCP/UDP 5-tuple. Fast (kernel can do it), protocol-agnostic. AWS NLB, HAProxy in TCP mode.
  • L7 LB: parses HTTP, routes by host, path, header, cookie. Slower but smarter. Nginx, Envoy, AWS ALB.

Most modern systems use both: L4 in front for raw throughput, L7 behind for routing logic.

Health checks

LB pings each backend periodically. Unhealthy backends are removed from the pool. Two kinds:

  • Active: LB sends test requests. Quick to detect failures, costs traffic.
  • Passive: LB tracks real request success/failure. Free, slower to detect.

Most production LBs use both: passive for normal operation, active for confirmation.

Numbers to memorize

  • Round-robin distribution variance: about 10% under random workload.
  • Power of 2 choices reduces tail by 2-5x over random.
  • Consistent hash reshuffle on N-to-N+1 backend change: 1/(N+1) of keys, not all.
  • Typical health check interval: 5-30 seconds.

Learn more