Replication strategies
Three patterns: single-leader, multi-leader, leaderless. Pick by write conflict tolerance and latency budget.
Replication exists for three reasons: latency (data near users), availability (survive node loss), throughput (scale reads). The strategy determines which you optimize for and what conflicts you create.
Single-leader replication
All writes go to one leader. Leader streams changes to followers. This is what 90% of production databases do.
- Pros: no write conflicts ever. Simple. Strong consistency on the leader.
- Cons: leader is a bottleneck and a SPOF. Failover takes seconds, may lose data with async replication.
- Used by: Postgres (default), MySQL primary/replica, MongoDB (since 4.0 the default).
Sync vs async: sync waits for at least one follower ack before returning success. Loses zero data on leader failure, costs latency. Async returns immediately, can lose recent writes on failover.
Postgres has a middle option: synchronous_commit = on with synchronous_standby_names. This is what you want for finance.
Multi-leader replication
Two or more leaders, each accepting writes, replicating to each other. Used for multi-region active-active.
- Pros: write latency is local. Survives entire region loss.
- Cons: write conflicts are now your problem. Last-write-wins loses data. CRDT or app-level merge is the right answer.
- Used by: Cassandra in multi-DC mode, CouchDB, BDR for Postgres.
Conflict resolution patterns:
- Last-write-wins (LWW): timestamp-based, drops conflicting writes.
- Vector clocks: detects conflicts, surfaces to application.
- CRDTs: data structures that merge deterministically.
Leaderless replication
No leader. Client writes to N replicas, reads from R replicas, with W writes confirmed. If W + R > N, you get strong consistency.
- Pros: no failover. Any node failure is invisible.
- Cons: client complexity, read-repair logic, anti-entropy needed.
- Used by: DynamoDB, Cassandra (when used this way), Riak.
This is the Dynamo paper model. The math: N=3, W=2, R=2 gives strong consistency and survives 1 node failure.
The interview answer
Default to single-leader with sync replication to one follower for HA. Add async followers for read scaling. Promote multi-leader only when cross-region write latency is a hard requirement, and only with CRDTs or app-level merge. Use leaderless when you need zero-downtime writes during node failures and can tolerate the client complexity.
Learn more
- ArticleDesigning Data-Intensive Applications, Chapter 5Martin Kleppmann
- DocsPostgreSQL replication docsPostgreSQL
- Paper