Consistency models
The full hierarchy with implementation notes, real database mappings, and how to debug consistency bugs.
The hierarchy in full
Jepsen's consistency map is the canonical visualization. From strongest to weakest:
- Strict serializable (linearizable + serializable). Real-time transaction order. Spanner.
- Linearizable. Single-object real-time order. Etcd, ZooKeeper, Redis with sync replication.
- Sequential. All nodes agree on order but order need not match wall clock. Rare in practice.
- Snapshot isolation. Reads see a consistent point-in-time snapshot. Postgres default, MySQL with InnoDB.
- Read committed. Sees committed writes, but can read different versions in one transaction. SQL default.
- Causal. Causally related operations observed in order. COPS, Riak with vector clocks.
- Session guarantees (RYW, monotonic reads, monotonic writes, writes-follow-reads).
- Eventual. Convergence with no other guarantees.
Each level is strictly weaker than the one above. Strong consistency models compose; weak ones often do not.
Linearizability vs serializability
These are different and people conflate them constantly.
- Linearizability is about single objects and real-time ordering. If write W completes before read R starts (wall clock), R sees W.
- Serializability is about transactions. The result of executing transactions concurrently equals some serial execution.
Linearizable + serializable = strict serializable. Spanner does this. It is the gold standard and the slowest.
Postgres at SERIALIZABLE isolation is serializable but not linearizable. You can commit a transaction, read it back on a replica, and see stale data, even though the transaction order is serializable.
This matters because "Postgres has serializable mode, so it is fully consistent" is wrong. Replication adds linearizability problems that the isolation level does not solve.
Causal consistency in detail
Causal consistency is the strongest model that remains available under partition. The key insight: only enforce ordering for operations that have a happens-before relationship.
A happens-before B if:
- A and B are in the same session and A comes first.
- B reads a value written by A.
- A happens-before C and C happens-before B (transitive).
Implementations use:
- Vector clocks: O(N) metadata per write, N = number of nodes.
- Version vectors: O(N) but for replicas, not nodes.
- Lamport clocks: single integer, but only gives total order, not causal.
- Hybrid logical clocks (HLC): hybrid of physical time and Lamport, used by CockroachDB.
The COPS paper (Lloyd et al, 2011) showed you can build a causally consistent key-value store with bounded metadata. MongoDB added causal consistency in 3.6 using "cluster time," which is essentially a HLC.
Session guarantees: the practical layer
Bayou's session guarantees (Terry et al, 1994) are the four you actually want most of the time:
- Read-your-writes: a session reads its own writes.
- Monotonic reads: successive reads see increasing versions.
- Monotonic writes: writes from a session are applied in order.
- Writes-follow-reads: if you read X then write Y, anyone seeing Y also sees X.
Together these give per-session causal consistency. Most apps need exactly this. The user is the session boundary.
Implementation: client carries a "version token" (vector or HLC timestamp). Server only serves reads at version >= token. If no replica is caught up, wait or fallback to primary.
Mapping to real databases
| Database | Default | Strongest available |
|---|---|---|
| Postgres | Read Committed | Serializable (single node) |
| MySQL InnoDB | Repeatable Read | Serializable |
| MongoDB | Snapshot (since 4.0) | Linearizable reads, causal sessions |
| Cassandra | Eventual | Linearizable via LWT (Paxos) |
| DynamoDB | Eventual | Linearizable via ConsistentRead |
| Spanner | Strict Serializable | Strict Serializable |
| CockroachDB | Serializable | Serializable |
| Etcd | Linearizable | Linearizable |
| Redis | None (single node) | Linearizable with WAIT |
Notice MongoDB's "snapshot" default. It is not eventual. Mongo has improved a lot since the Jepsen 2013 era.
Debugging consistency bugs
Most consistency bugs look like this: "sometimes the user sees stale data after they updated."
Diagnosis tree:
- Are reads going to a replica? Yes -> check replication lag. If write happened <lag ago, that's your bug.
- Are you caching? Cache invalidation order matters. Invalidate before write, not after, or use write-through.
- Are you using async writes (fire and forget)? The write might not have happened yet.
- Cross-region? RTT alone can cause "stale" reads if client switched regions.
Fix patterns:
- Read-your-writes: route reads to primary for N seconds, or use causal token.
- Cache staleness: write-through cache, or short TTL on read-heavy paths.
- Replication lag: monitor lag, route around lagging replicas.
- Cross-region: explicit version checks, or pin user to a region.
When linearizability is worth the cost
Pay for linearizability when:
- Money or inventory. Double-spend is unacceptable.
- Coordination (leader election, locks, config). One wrong leader is catastrophic.
- Uniqueness constraints (usernames, emails). Two registrations with the same email is a bug.
- Audit and compliance. Real-time order matters for forensics.
Do not pay for linearizability when:
- Social signals (likes, views).
- Recommendations and ranking.
- Analytics and metrics.
- Caches of derived data.
The Spanner trick
Spanner achieves global strict serializability with reasonable latency by using TrueTime. TrueTime is an API that returns an interval [earliest, latest] within which the current time falls, with bounded uncertainty.
To commit a transaction, Spanner picks a timestamp t after the latest of TrueTime.now(). It then waits until TrueTime.now() definitely passes t before acknowledging. This commit-wait is ~7ms in Google's deployment, because their GPS+atomic clock sync keeps uncertainty under 7ms.
This is a hardware-augmented solution. Without TrueTime, you would need explicit consensus rounds across regions, which is much slower. CockroachDB attempts the same idea with hybrid logical clocks and gets close but cannot guarantee strict serializability without uncertainty windows.
Pitfalls and gotchas
- "We use Postgres so we have ACID." Postgres ACID is per-node. Replication adds eventual reads unless you explicitly read from primary.
- "Strong consistency is always better." Strong consistency is unavailable under partition. Your users will see errors. That is often worse than slightly stale data.
- "Eventual consistency is fine for most things." Eventual without session guarantees breaks user expectations constantly. Users expect to see their own writes.
- "Linearizable reads are free if I read from the leader." Reads from a leader can still be stale if the leader has been demoted but does not know. Real linearizable reads require a quorum check (etcd's linearizable read does this).
What to read next
- DDIA chapter 9. The best treatment.
- Jepsen analyses, especially MongoDB and Cassandra. Watch real databases get broken.
- Spanner paper for TrueTime.
- COPS paper for causal consistency at scale.
Learn more
- ArticleDesigning Data-Intensive Applications, Chapter 9Martin Kleppmann
- DocsConsistency models, JepsenKyle Kingsbury
- PaperLinearizability paperHerlihy and Wing
- Paper
- PaperBayou paperXerox PARC