Rate limiting (token, leaky, rolling)
Three algorithms, three tradeoffs: token bucket allows bursts, leaky bucket smooths output, sliding window gives precise counts.
Rate limiting protects your service from a single noisy client eating capacity meant for everyone else. The choice of algorithm decides what kind of traffic shape you tolerate.
The three you must know
Token bucket. A bucket holds N tokens. Requests consume one token. Tokens refill at rate R per second. Allows bursts up to N, sustained rate of R. This is what Stripe, AWS, and most public APIs use. It is friendly to real clients who batch then idle.
Leaky bucket. Requests enter a queue. The queue drains at fixed rate R. If full, requests are dropped. Output is perfectly smooth. Use it when the downstream is fragile and cannot handle bursts at all, like an SMS provider with strict TPS.
Sliding window (rolling). Count requests in the last 60 seconds, not in the current calendar minute. Prevents the boundary problem where a client sends 100 at 12:00:59 and 100 more at 12:01:00. Slightly more expensive: you store timestamps or use a weighted approximation.
Where to enforce it
- Edge (CDN, API gateway). Cheap, protects origin. Per-IP, per-API-key.
- Service. Per-tenant, per-endpoint. Use Redis with
INCRandEXPIREor a Lua script for atomicity. - Database. Last line of defense. Usually a connection pool cap, not a counter.
Always return 429 Too Many Requests with Retry-After and X-RateLimit-Remaining. Clients that respect these headers retry with backoff. Clients that don't get blocked.
The Spur trap
At the Spur interview I was asked how fast a rate limiter must be. The answer is sub-millisecond, because it runs in the request path of every call. If your limiter takes 5ms, you just added 5ms to p50 latency for every user. That is why production limiters live in Redis with Lua scripts, or in-memory with periodic sync, never in Postgres with a row lock.
Default playbook
Token bucket per API key, 100 req/s sustained, 200 burst. Sliding window per IP at the edge to catch credential stuffing. Leaky bucket only when talking to a strict downstream. Reject with 429 plus headers. Log the rejections, alert when a single key hits the limit for 5 minutes straight.
Learn more
- Article
- ArticleStripe: Scaling your API with rate limitersStripe blog
- Docs