In revision.
Crisp5 min readGo deeper →

Replication strategies

Three patterns: single-leader, multi-leader, leaderless. Pick by write conflict tolerance and latency budget.

Replication exists for three reasons: latency (data near users), availability (survive node loss), throughput (scale reads). The strategy determines which you optimize for and what conflicts you create.

Single-leader replication

All writes go to one leader. Leader streams changes to followers. This is what 90% of production databases do.

  • Pros: no write conflicts ever. Simple. Strong consistency on the leader.
  • Cons: leader is a bottleneck and a SPOF. Failover takes seconds, may lose data with async replication.
  • Used by: Postgres (default), MySQL primary/replica, MongoDB (since 4.0 the default).

Sync vs async: sync waits for at least one follower ack before returning success. Loses zero data on leader failure, costs latency. Async returns immediately, can lose recent writes on failover.

Postgres has a middle option: synchronous_commit = on with synchronous_standby_names. This is what you want for finance.

Multi-leader replication

Two or more leaders, each accepting writes, replicating to each other. Used for multi-region active-active.

  • Pros: write latency is local. Survives entire region loss.
  • Cons: write conflicts are now your problem. Last-write-wins loses data. CRDT or app-level merge is the right answer.
  • Used by: Cassandra in multi-DC mode, CouchDB, BDR for Postgres.

Conflict resolution patterns:

  • Last-write-wins (LWW): timestamp-based, drops conflicting writes.
  • Vector clocks: detects conflicts, surfaces to application.
  • CRDTs: data structures that merge deterministically.

Leaderless replication

No leader. Client writes to N replicas, reads from R replicas, with W writes confirmed. If W + R > N, you get strong consistency.

  • Pros: no failover. Any node failure is invisible.
  • Cons: client complexity, read-repair logic, anti-entropy needed.
  • Used by: DynamoDB, Cassandra (when used this way), Riak.

This is the Dynamo paper model. The math: N=3, W=2, R=2 gives strong consistency and survives 1 node failure.

Single-leader, multi-leader, leaderless

The interview answer

Default to single-leader with sync replication to one follower for HA. Add async followers for read scaling. Promote multi-leader only when cross-region write latency is a hard requirement, and only with CRDTs or app-level merge. Use leaderless when you need zero-downtime writes during node failures and can tolerate the client complexity.

Learn more