Deep dive15 min read← Back to crisp

Consistency models

The full hierarchy with implementation notes, real database mappings, and how to debug consistency bugs.

The hierarchy in full

Jepsen's consistency map is the canonical visualization. From strongest to weakest:

Strict serializable (linearizable + serializable). Real-time transaction order. Spanner.
Linearizable. Single-object real-time order. Etcd, ZooKeeper, Redis with sync replication.
Sequential. All nodes agree on order but order need not match wall clock. Rare in practice.
Snapshot isolation. Reads see a consistent point-in-time snapshot. Postgres default, MySQL with InnoDB.
Read committed. Sees committed writes, but can read different versions in one transaction. SQL default.
Causal. Causally related operations observed in order. COPS, Riak with vector clocks.
Session guarantees (RYW, monotonic reads, monotonic writes, writes-follow-reads).
Eventual. Convergence with no other guarantees.

Each level is strictly weaker than the one above. Strong consistency models compose; weak ones often do not.

Consistency lattice (simplified from Jepsen)

Linearizability vs serializability

These are different and people conflate them constantly.

Linearizability is about single objects and real-time ordering. If write W completes before read R starts (wall clock), R sees W.
Serializability is about transactions. The result of executing transactions concurrently equals some serial execution.

Linearizable + serializable = strict serializable. Spanner does this. It is the gold standard and the slowest.

Postgres at SERIALIZABLE isolation is serializable but not linearizable. You can commit a transaction, read it back on a replica, and see stale data, even though the transaction order is serializable.

This matters because "Postgres has serializable mode, so it is fully consistent" is wrong. Replication adds linearizability problems that the isolation level does not solve.

Causal consistency in detail

Causal consistency is the strongest model that remains available under partition. The key insight: only enforce ordering for operations that have a happens-before relationship.

A happens-before B if:

A and B are in the same session and A comes first.
B reads a value written by A.
A happens-before C and C happens-before B (transitive).

Implementations use:

Vector clocks: O(N) metadata per write, N = number of nodes.
Version vectors: O(N) but for replicas, not nodes.
Lamport clocks: single integer, but only gives total order, not causal.
Hybrid logical clocks (HLC): hybrid of physical time and Lamport, used by CockroachDB.

The COPS paper (Lloyd et al, 2011) showed you can build a causally consistent key-value store with bounded metadata. MongoDB added causal consistency in 3.6 using "cluster time," which is essentially a HLC.

Session guarantees: the practical layer

Bayou's session guarantees (Terry et al, 1994) are the four you actually want most of the time:

Read-your-writes: a session reads its own writes.
Monotonic reads: successive reads see increasing versions.
Monotonic writes: writes from a session are applied in order.
Writes-follow-reads: if you read X then write Y, anyone seeing Y also sees X.

Together these give per-session causal consistency. Most apps need exactly this. The user is the session boundary.

Implementation: client carries a "version token" (vector or HLC timestamp). Server only serves reads at version >= token. If no replica is caught up, wait or fallback to primary.

Mapping to real databases

Database	Default	Strongest available
Postgres	Read Committed	Serializable (single node)
MySQL InnoDB	Repeatable Read	Serializable
MongoDB	Snapshot (since 4.0)	Linearizable reads, causal sessions
Cassandra	Eventual	Linearizable via LWT (Paxos)
DynamoDB	Eventual	Linearizable via ConsistentRead
Spanner	Strict Serializable	Strict Serializable
CockroachDB	Serializable	Serializable
Etcd	Linearizable	Linearizable
Redis	None (single node)	Linearizable with WAIT

Notice MongoDB's "snapshot" default. It is not eventual. Mongo has improved a lot since the Jepsen 2013 era.

Debugging consistency bugs

Most consistency bugs look like this: "sometimes the user sees stale data after they updated."

Diagnosis tree:

Are reads going to a replica? Yes -> check replication lag. If write happened <lag ago, that's your bug.
Are you caching? Cache invalidation order matters. Invalidate before write, not after, or use write-through.
Are you using async writes (fire and forget)? The write might not have happened yet.
Cross-region? RTT alone can cause "stale" reads if client switched regions.

Fix patterns:

Read-your-writes: route reads to primary for N seconds, or use causal token.
Cache staleness: write-through cache, or short TTL on read-heavy paths.
Replication lag: monitor lag, route around lagging replicas.
Cross-region: explicit version checks, or pin user to a region.

When linearizability is worth the cost

Pay for linearizability when:

Money or inventory. Double-spend is unacceptable.
Coordination (leader election, locks, config). One wrong leader is catastrophic.
Uniqueness constraints (usernames, emails). Two registrations with the same email is a bug.
Audit and compliance. Real-time order matters for forensics.

Do not pay for linearizability when:

Social signals (likes, views).
Recommendations and ranking.
Analytics and metrics.
Caches of derived data.

The Spanner trick

Spanner achieves global strict serializability with reasonable latency by using TrueTime. TrueTime is an API that returns an interval [earliest, latest] within which the current time falls, with bounded uncertainty.

To commit a transaction, Spanner picks a timestamp t after the latest of TrueTime.now(). It then waits until TrueTime.now() definitely passes t before acknowledging. This commit-wait is ~7ms in Google's deployment, because their GPS+atomic clock sync keeps uncertainty under 7ms.

This is a hardware-augmented solution. Without TrueTime, you would need explicit consensus rounds across regions, which is much slower. CockroachDB attempts the same idea with hybrid logical clocks and gets close but cannot guarantee strict serializability without uncertainty windows.

Pitfalls and gotchas

"We use Postgres so we have ACID." Postgres ACID is per-node. Replication adds eventual reads unless you explicitly read from primary.
"Strong consistency is always better." Strong consistency is unavailable under partition. Your users will see errors. That is often worse than slightly stale data.
"Eventual consistency is fine for most things." Eventual without session guarantees breaks user expectations constantly. Users expect to see their own writes.
"Linearizable reads are free if I read from the leader." Reads from a leader can still be stale if the leader has been demoted but does not know. Real linearizable reads require a quorum check (etcd's linearizable read does this).

Learn more

Article
Designing Data-Intensive Applications, Chapter 9Martin Kleppmann
Docs
Consistency models, JepsenKyle Kingsbury
Paper
Linearizability paperHerlihy and Wing
Paper
Session guarantees for weakly consistent replicated dataTerry et al
Paper
Bayou paperXerox PARC

Deep dive15 min read← Back to crisp

Consistency models

The full hierarchy with implementation notes, real database mappings, and how to debug consistency bugs.

The hierarchy in full

Jepsen's consistency map is the canonical visualization. From strongest to weakest:

Strict serializable (linearizable + serializable). Real-time transaction order. Spanner.
Linearizable. Single-object real-time order. Etcd, ZooKeeper, Redis with sync replication.
Sequential. All nodes agree on order but order need not match wall clock. Rare in practice.
Snapshot isolation. Reads see a consistent point-in-time snapshot. Postgres default, MySQL with InnoDB.
Read committed. Sees committed writes, but can read different versions in one transaction. SQL default.
Causal. Causally related operations observed in order. COPS, Riak with vector clocks.
Session guarantees (RYW, monotonic reads, monotonic writes, writes-follow-reads).
Eventual. Convergence with no other guarantees.

Each level is strictly weaker than the one above. Strong consistency models compose; weak ones often do not.

Consistency lattice (simplified from Jepsen)

Linearizability vs serializability

These are different and people conflate them constantly.

Linearizability is about single objects and real-time ordering. If write W completes before read R starts (wall clock), R sees W.
Serializability is about transactions. The result of executing transactions concurrently equals some serial execution.

Linearizable + serializable = strict serializable. Spanner does this. It is the gold standard and the slowest.

This matters because "Postgres has serializable mode, so it is fully consistent" is wrong. Replication adds linearizability problems that the isolation level does not solve.

Causal consistency in detail

Causal consistency is the strongest model that remains available under partition. The key insight: only enforce ordering for operations that have a happens-before relationship.

A happens-before B if:

A and B are in the same session and A comes first.
B reads a value written by A.
A happens-before C and C happens-before B (transitive).

Implementations use:

Vector clocks: O(N) metadata per write, N = number of nodes.
Version vectors: O(N) but for replicas, not nodes.
Lamport clocks: single integer, but only gives total order, not causal.
Hybrid logical clocks (HLC): hybrid of physical time and Lamport, used by CockroachDB.

Session guarantees: the practical layer

Bayou's session guarantees (Terry et al, 1994) are the four you actually want most of the time:

Read-your-writes: a session reads its own writes.
Monotonic reads: successive reads see increasing versions.
Monotonic writes: writes from a session are applied in order.
Writes-follow-reads: if you read X then write Y, anyone seeing Y also sees X.

Together these give per-session causal consistency. Most apps need exactly this. The user is the session boundary.

Implementation: client carries a "version token" (vector or HLC timestamp). Server only serves reads at version >= token. If no replica is caught up, wait or fallback to primary.

Mapping to real databases

Database	Default	Strongest available
Postgres	Read Committed	Serializable (single node)
MySQL InnoDB	Repeatable Read	Serializable
MongoDB	Snapshot (since 4.0)	Linearizable reads, causal sessions
Cassandra	Eventual	Linearizable via LWT (Paxos)
DynamoDB	Eventual	Linearizable via ConsistentRead
Spanner	Strict Serializable	Strict Serializable
CockroachDB	Serializable	Serializable
Etcd	Linearizable	Linearizable
Redis	None (single node)	Linearizable with WAIT

Notice MongoDB's "snapshot" default. It is not eventual. Mongo has improved a lot since the Jepsen 2013 era.

Debugging consistency bugs

Most consistency bugs look like this: "sometimes the user sees stale data after they updated."

Diagnosis tree:

Are reads going to a replica? Yes -> check replication lag. If write happened <lag ago, that's your bug.
Are you caching? Cache invalidation order matters. Invalidate before write, not after, or use write-through.
Are you using async writes (fire and forget)? The write might not have happened yet.
Cross-region? RTT alone can cause "stale" reads if client switched regions.

Fix patterns:

Read-your-writes: route reads to primary for N seconds, or use causal token.
Cache staleness: write-through cache, or short TTL on read-heavy paths.
Replication lag: monitor lag, route around lagging replicas.
Cross-region: explicit version checks, or pin user to a region.

When linearizability is worth the cost

Pay for linearizability when:

Money or inventory. Double-spend is unacceptable.
Coordination (leader election, locks, config). One wrong leader is catastrophic.
Uniqueness constraints (usernames, emails). Two registrations with the same email is a bug.
Audit and compliance. Real-time order matters for forensics.

Do not pay for linearizability when:

Social signals (likes, views).
Recommendations and ranking.
Analytics and metrics.
Caches of derived data.

The Spanner trick

Pitfalls and gotchas

"We use Postgres so we have ACID." Postgres ACID is per-node. Replication adds eventual reads unless you explicitly read from primary.
"Strong consistency is always better." Strong consistency is unavailable under partition. Your users will see errors. That is often worse than slightly stale data.
"Eventual consistency is fine for most things." Eventual without session guarantees breaks user expectations constantly. Users expect to see their own writes.
"Linearizable reads are free if I read from the leader." Reads from a leader can still be stale if the leader has been demoted but does not know. Real linearizable reads require a quorum check (etcd's linearizable read does this).

Learn more

Article
Designing Data-Intensive Applications, Chapter 9Martin Kleppmann
Docs
Consistency models, JepsenKyle Kingsbury
Paper
Linearizability paperHerlihy and Wing
Paper
Session guarantees for weakly consistent replicated dataTerry et al
Paper
Bayou paperXerox PARC

Consistency models

The hierarchy in full

Linearizability vs serializability

Causal consistency in detail

Session guarantees: the practical layer

Mapping to real databases

Debugging consistency bugs

When linearizability is worth the cost

The Spanner trick

Pitfalls and gotchas

What to read next

Learn more

Consistency models

The hierarchy in full

Linearizability vs serializability

Causal consistency in detail

Session guarantees: the practical layer

Mapping to real databases

Debugging consistency bugs

When linearizability is worth the cost

The Spanner trick

Pitfalls and gotchas

What to read next

Learn more