Caching strategies
Patterns, invalidation strategies, stampede prevention, multi-tier caches, and how to actually measure if your cache is helping.
A cache is a contract: trade a small probability of returning stale data for a large reduction in latency and load. Like any contract, the failure modes matter more than the happy path. The cache that works fine in dev and crashes prod is the most common postmortem cause I have seen in my career.
The four caching patterns, in depth
Cache-aside (lazy loading)
The default. App is the orchestrator.
Read:
- Check cache.
- If hit, return.
- If miss, read DB, write to cache, return.
Write:
- Write DB.
- Invalidate cache key (or update it).
Pros: simple, only caches what is actually requested, app fully controls TTL.
Cons: cache misses pay the full DB cost. Stampede risk. Easy to forget to invalidate on a write path.
Read-through
Same as cache-aside but the cache library handles the miss. App always queries the cache; cache fetches from DB on miss.
Pros: cleaner app code.
Cons: cache library must know how to query your DB. Less flexible.
Write-through
App writes to cache; cache synchronously writes to DB.
Pros: cache always fresh. No invalidation needed.
Cons: every write pays both cache and DB latency. Cache must be populated before reads (cold start problem). Writes that never get read still populate the cache.
Write-back (write-behind)
App writes to cache; cache asynchronously writes to DB.
Pros: very fast writes (cache speed only).
Cons: if cache crashes before flushing, writes are lost. Hard to maintain consistency. Used in high-throughput systems with explicit durability tradeoffs.
Refresh-ahead
Cache proactively refreshes entries before TTL expires.
Pros: no miss-induced latency for hot keys. Smooth load on DB.
Cons: refreshes entries that may never be requested again. Best for predictable hot keys.
The HTTP Cache-Control: stale-while-revalidate=60 directive is refresh-ahead at the network layer. The browser serves stale content immediately while fetching a fresh copy in the background.
Invalidation: the real problem
The cache itself is a hash table. Invalidation is where bugs live.
TTL-only invalidation
Set a TTL on every key. The cache expires entries automatically.
Pros: simplest. No invalidation logic in the app.
Cons: data is stale for up to TTL after a write. Not acceptable for read-after-write consistency.
Use for: feature flags refreshed every minute, product catalog refreshed every hour, anything where 1-5 min staleness is fine.
Explicit invalidation on write
The app code that writes to the DB also deletes (or updates) the cache key.
def update_user(user_id, data):
db.update("users", id=user_id, **data)
cache.delete(f"user:{user_id}")Pros: tight consistency. Reads after writes see the new value.
Cons: brittle. Every write path must remember to invalidate. Easy to miss when a new write path is added. Race condition: another reader can populate the cache with stale data after the delete but before the DB commit.
Update vs delete
When invalidating, you can either:
- Delete the key. Next read repopulates. Simple, but a thundering herd of misses if the key is hot.
- Update the key. Write the new value to cache too. No miss penalty. But the cache and DB write are not atomic - if cache update succeeds and DB write fails, cache is wrong.
Default to delete. Update only when you have measured a stampede risk and accepted the tradeoff.
Change data capture (CDC)
The DB's write-ahead log is consumed by a tool (Debezium, AWS DMS) that publishes changes to a queue. A cache invalidator reads the queue and invalidates affected keys.
Pros: decoupled. App code does not know about caching. Works for writes from any source (admin scripts, other services).
Cons: more infrastructure. Latency from write to invalidation (typically <1s with Kafka + Debezium).
Use when you have many write paths or external systems writing to your DB.
Versioned keys
Append a version to the cache key. To invalidate, bump the version.
cache key: "product:42:v17"
A write to product 42 increments the version (maybe in another cache entry or DB). New reads use v18. Old entries expire naturally via TTL.
Pros: invalidation is just a counter increment. Old data is naturally orphaned.
Cons: leaks old keys until TTL. Slightly more cache memory used.
Useful for aggregations: "leaderboard for league 5, season 12, week 38" becomes obsolete when week ends.
The two famous failure modes
Thundering herd / cache stampede
The TTL on a hot key expires. 1000 simultaneous requests all miss. All 1000 hit the DB. The DB might fall over. Even if it survives, the latency spike is brutal.
Solutions:
Locking. Only one request recomputes; others wait or get the old value. Implement with SET NX EX in Redis: first request acquires the lock, others see the lock and either block or return stale.
Probabilistic early expiration. Each request computes a probability of refresh based on how close to TTL it is. Smooths the recompute across many requests instead of one cliff.
Refresh-ahead. Refresh the key before it expires. Hot keys never miss.
Stale-while-revalidate. Return the stale value immediately. Trigger an async refresh. Used by Next.js ISR, browser HTTP cache, CDNs.
Cache cold start
After deploying or restarting the cache, every read is a miss. DB gets hammered.
Solutions:
Pre-warm. A script populates the cache with known hot keys before the cache is put into rotation.
Gradual rollout. Bring up the cache, send 1% of traffic, increase as cache fills.
Long TTLs. Keys, once populated, stick around. Minimizes the cold window.
Cache penetration
Requests for keys that do not exist in DB. Every request misses cache and queries DB. Attackers can exploit this.
Solutions:
Cache the negative result. Store a marker like null with short TTL. Subsequent requests return the marker instead of hitting DB.
Bloom filter. A space-efficient probabilistic data structure that tells you "this key definitely doesn't exist" or "it might exist." Check before querying the DB.
Multi-tier caching
Production systems layer caches. Each tier catches a different request profile.
- Browser cache. Set by
Cache-Controlheaders. Fastest possible: 0 network. Use for static assets. - CDN (Cloudflare, Fastly, CloudFront). Geographic edge. Use for static assets, public API responses, public pages.
- Reverse proxy (nginx, Varnish). Per-origin caching for the things CDN cannot do.
- App in-process LRU. Microsecond hit time. Use for hot keys accessed many times per request (config, feature flags). Per pod, can be inconsistent across pods - so only for data that tolerates this.
- Distributed cache (Redis, Memcached). Shared across pods. Millisecond hit time. Use for expensive queries, session data.
- DB query cache / materialized views. If queries are repetitive, let the DB cache.
The closer to the user, the higher the hit rate impact. A 1% CDN hit improvement saves more than a 10% Redis hit improvement.
Eviction policies
Cache memory is finite. When full, something must go.
- LRU (Least Recently Used). Default for most caches. Evicts what hasn't been used.
- LFU (Least Frequently Used). Evicts least-accessed. Better for skewed access patterns.
- TTL-based. Evicts only on TTL expiry. Add LRU as backstop when memory is full.
- W-TinyLFU. Modern hybrid. Used by Caffeine in Java. State-of-the-art hit rates for typical workloads.
Redis defaults to noeviction (returns error when full). Always set maxmemory-policy to allkeys-lru or similar for cache use cases.
Consistency models
Your cache makes a consistency promise:
Strong consistency. Every read returns the latest write. Requires write-through with cache as source of truth, or distributed transactions. Rare and expensive.
Eventual consistency. Reads may return stale data, eventually catch up. Default for most caches.
Read-after-write consistency. A reader sees their own writes immediately. Common pattern: after write, set a session sticky bit or read directly from DB for N seconds.
Bounded staleness. Reads are at most N seconds stale. TTL gives this for free.
Pick the weakest model that works. Don't pay for strong consistency you don't need.
Measuring whether the cache helps
Track:
- Hit rate. Cached hits / total reads. Want >80% for the cache to be worth operating.
- Miss latency. What does a miss cost? If misses are 5ms, who cares. If they're 500ms, your worst-case latency is brutal.
- Cache size and eviction rate. Frequent evictions = cache too small or too churny.
- Origin load reduction. Before/after deploying the cache, how much load came off the DB?
A common pattern I see: team adds a cache, doesn't measure, eventually finds the cache had a 30% hit rate. They're paying Redis cost and miss-path latency, gaining almost nothing. Always measure.
When to NOT cache
- Per-user, low-repeat data. If each cache entry is read once, the cache adds latency.
- Highly mutable data. Constant invalidation = cache thrash.
- Strong consistency requirements. Bank balance at withdrawal. Just read the DB.
- Cheap origin queries. If the DB query is 1ms, adding a 0.5ms Redis call saves 0.5ms in the best case and costs 0.5ms in the worst. Not worth the complexity.
What I would default to
For a typical SaaS:
- CDN. Cloudflare in front, cache static assets aggressively (1 year with content hashes in filenames), cache public API responses for 5 minutes.
- Redis. Cache-aside, 5-15 minute TTL, explicit invalidation on writes for the hot keys. Use Redis Cluster for HA.
- App in-process LRU. For feature flags and config (60s TTL), per pod.
- No write-back, no exotic patterns. Start boring.
Add complexity only when measurement shows you need it.
Learn more
- Docs
- ArticleCloudflare: cache reserveCloudflare blog
- PaperFacebook: scaling Memcache at Facebook (paper)Meta Research
- ArticleDesigning Data-Intensive Applications, ch 5Martin Kleppmann
- Docs