Consensus and Raft basics
Consensus = N nodes agree on a value despite failures. Raft is Paxos you can understand: leader election, log replication, safety.
Consensus is N nodes agreeing on a single value despite failures. It is the building block under everything that needs strong consistency: leader election, distributed locks, configuration management, replicated state machines.
The FLP impossibility result (1985) says no deterministic consensus algorithm works in a fully asynchronous network with even one faulty node. Real systems work around this with timeouts (turning crashes into "I think you're dead" decisions).
Raft in three concepts
- Leader election: one node is leader at a time. If leader dies, followers time out and run an election.
- Log replication: clients send commands to leader. Leader appends to log, replicates to followers. Once majority ack, the command is committed.
- Safety: a committed entry never gets lost or overwritten, even across leader changes.
Raft uses terms (logical time). Each term has at most one leader. Higher term wins.
Election
Followers wait randomly between 150-300ms for a leader heartbeat. If no heartbeat, become candidate, increment term, vote for self, request votes from others.
A candidate wins if it gets majority votes. Becomes leader, sends heartbeats. If split vote, time out again and retry with new randomized timeout.
Randomized timeouts prevent split votes from looping forever.
Replication
Leader appends client commands to its log. Sends AppendEntries RPC to followers. Once majority have the entry, leader marks it committed and applies to state machine. Followers apply when they learn it is committed.
If a follower is behind, leader retransmits. If a follower has conflicting log entries (from a deposed leader), leader overwrites them. Leader's log is the source of truth.
Safety guarantees
- Election safety: at most one leader per term.
- Leader append-only: leaders never overwrite or delete log entries.
- Log matching: if two logs have an entry with the same index and term, all preceding entries are identical.
- Leader completeness: a committed entry is in every future leader's log.
- State machine safety: if a server applies an entry at index i, no other server applies a different entry at i.
These five properties together give linearizability.
Used everywhere
Etcd, Consul, CockroachDB, TiKV, MongoDB (modified), Kafka KRaft mode, RethinkDB. The reason: Raft is correct, understandable, and has reference implementations.
Learn more
- PaperRaft paperDiego Ongaro and John Ousterhout
- DocsRaft visualizationraft.github.io
- ArticleDesigning Data-Intensive Applications, Chapter 9Martin Kleppmann