In revision.
Crisp5 min readGo deeper →

Gossip protocols

Nodes share state with random peers. State spreads exponentially. Used for membership, failure detection, and cluster state.

Gossip protocols spread information through a cluster by having each node share state with a few random peers periodically. After log(N) rounds, every node knows everything. Used for cluster membership, failure detection, and propagating cluster state.

Why gossip

Broadcast doesn't scale (sender does N work). Centralized coordination doesn't scale (one server bottleneck). Gossip is decentralized and probabilistic: each round, each node does O(1) work, and information propagates in O(log N) rounds.

Spreads fast even on lossy networks. Tolerates random node failures. No single point of failure.

The basic loop

Each node, every T seconds:

  1. Pick K random peers (typically K=1-3, T=1 second).
  2. Exchange state with each.
  3. Merge state (latest version wins per key, or last-write-wins).

After O(log N) rounds, every node has the latest state. For a 1000-node cluster, that's ~10 seconds.

SWIM: the canonical failure detector

Scalable Weakly-consistent Infection-style Process Group Membership protocol. Used by Hashicorp Consul, Serf, Memberlist.

Each round:

  1. Node A picks random node B, sends ping.
  2. If B replies within timeout, B is alive.
  3. If no reply, A asks K random nodes to ping B on its behalf.
  4. If still no reply, A marks B as suspect, gossips this.
  5. After timeout, suspect is marked dead.

Indirect probing eliminates false positives from network blips between A and B.

What gossip is good for

  • Cluster membership: who's in the cluster, who joined, who left.
  • Failure detection: who's alive, who's dead, who's suspect.
  • Configuration state: feature flags, version info.
  • Metrics and telemetry: spread sampled stats.

What gossip is NOT good for

  • Strong consistency. Convergence is eventual, takes seconds.
  • Sequencing. No total order.
  • High-frequency state. Bandwidth grows with state size.
  • Mission-critical coordination. Use Raft/Paxos.
Gossip spreads exponentially: 1 node in round 0, ~2^r in round r

Learn more