Deep dive15 min read← Back to crisp

Redis - Deep Dive

Memory model, persistence math, replication and Sentinel, Cluster mode, rate limiter algorithms, and the surprising bottlenecks.

Redis looks like a hash map with extra commands. It is much more than that once you put it in production. This is the operational and architectural view.

The single-threaded execution model

Redis runs commands from a single I/O thread (older versions) or with I/O threading for socket reads/writes but command execution still serialized (Redis 6+). Every command sees the entire dataset in a consistent state because no other command is running.

This gives you free atomicity for any single command. It also gives you a sharp cliff: any slow command stalls every other client. The list of slow commands is short but real:

KEYS pattern: O(N) scan of every key.
SMEMBERS bigset: O(N) where N can be millions.
LRANGE list 0 -1: O(N) copy.
DEBUG SLEEP: literally blocks.
Lua scripts that loop too long.

SCAN, SSCAN, HSCAN, ZSCAN exist precisely to iterate in O(1) chunks without blocking.

Memory and eviction

RAM is the constraint. Configure maxmemory and maxmemory-policy. The policies:

noeviction: writes fail when full. Safe for source-of-truth data, dangerous for caches.
allkeys-lru: evict least recently used across all keys.
allkeys-lfu: least frequently used. Better hit rate for most workloads, default for new deployments.
volatile-lru/volatile-lfu/volatile-ttl: only evict keys with TTL set.

The LRU and LFU implementations are approximate. Redis samples 5 keys (configurable) and evicts the worst one. Good enough for cache workloads, never trust it for strict ordering.

Memory fragmentation is real. The mem_fragmentation_ratio in INFO should be near 1.0. Above 1.5 and you are wasting RAM. The activedefrag setting helps but costs CPU.

Persistence: RDB vs AOF vs both

RDB is a binary snapshot of the entire dataset, written by a forked child process. The fork uses copy-on-write so it does not block, but on a 32 GB Redis the fork itself takes time and your memory usage spikes if writes are heavy during the snapshot.

AOF logs every write command. On restart, Redis replays the log. You choose the fsync policy:

always: fsync after every command. Safest, slowest, kills throughput.
everysec: fsync once per second. The standard choice. Up to 1 second of data loss on power failure.
no: let the OS decide. Up to 30 seconds of loss.

AOF files grow forever, so Redis periodically rewrites them by walking the dataset and emitting minimum commands to reconstruct it. This is the AOF rewrite, and like RDB it uses a fork.

In production we ran RDB every 5 minutes plus AOF with everysec. Restart uses AOF (more recent), the RDB is for cross-region backup.

Replication

Redis replication is asynchronous primary-to-replica. Writes go to the primary, replicate to N replicas in the background. Reads can hit replicas but you accept staleness. The replication offset tells you how far behind a replica is.

If a replica falls too far behind, it does a full resync: primary forks, dumps an RDB, ships it, then streams the backlog. During this the primary uses extra memory and disk. Avoid by sizing your replication backlog (repl-backlog-size) to cover your longest expected disconnect.

Sentinel and Cluster

Sentinel is a separate process that watches a primary and its replicas, and triggers failover if the primary is unreachable. It does not shard data. Use Sentinel when one Redis fits your dataset and you need HA.

Cluster is sharded Redis. The keyspace is divided into 16,384 hash slots, distributed across primaries. Each primary has 0+ replicas. Clients use a smart driver that knows the slot map and routes commands directly.

Cluster has a hard rule: multi-key commands (MGET, MSET, transactions, Lua scripts) only work if all keys map to the same slot. You force this with hash tags: {user:123}:profile and {user:123}:settings both hash on user:123 and land on the same slot.

Failover in Cluster takes seconds. Clients see MOVED and ASK redirections and update their slot map. Most drivers handle this transparently.

Rate limiting properly

The naive rate limiter is INCR + EXPIRE. It is buggy if the process dies between the two commands. Fix it with SET counter 0 EX 60 NX followed by INCR, or with a single Lua script.

The naive limiter is also bursty: 100 requests per minute means a client can fire all 100 in the first second. Sliding window log fixes this:

ZADD rate:user:42 <now> <uuid>
ZREMRANGEBYSCORE rate:user:42 -inf <now - 60000>
ZCARD rate:user:42
EXPIRE rate:user:42 60

All four commands in a Lua script for atomicity. The sorted set stores one entry per request. Membership in the last 60 seconds is your current rate. Memory cost is one entry per allowed request per user, which is bounded.

Token bucket is more memory-efficient: one counter, refilled by elapsed time on each request. Implement it in Lua, store last_refill_time and tokens in a hash.

Streams and consumer groups

Redis Streams are an append-only log with consumer groups. They give you Kafka-lite semantics: at-least-once delivery, consumer offset tracking, pending entries lists for failed handlers.

Use them when you need durability across consumer crashes and you do not want to pay Kafka operational cost. Do not use them when you need ordering across partitions or compaction.

What actually broke in production

Lua script that called redis.call in a loop, blocking for 200ms. Surfaced as random latency spikes on unrelated keys. Lesson: scripts have a 5-second timeout but you do not want them anywhere near it.
Memory growth from a forgotten key prefix that never expired. MEMORY USAGE and --bigkeys flag on redis-cli caught it.
Network partition between app and Redis. Connection pool exhausted, app processes hung. Lesson: always set a tight client timeout. We use 200ms.

Learn more

Docs
Redis DocumentationRedis
Docs
Redis Cluster SpecificationRedis
Docs
Redis PersistenceRedis
Article
Designing Data-Intensive ApplicationsMartin Kleppmann
Article
Cloudflare: How we built rate limitingCloudflare Blog