Two-phase commit and limits
2PC mechanics, why it blocks, 3PC and its issues, sagas, the outbox pattern, and how production systems actually handle cross-service atomicity.
The atomic commit problem
Multiple participants must agree to either all commit or all abort a distributed transaction. Sounds simple. It is hard:
- Participants can fail independently.
- Messages can be lost.
- Coordinator can fail.
- Decisions must be durable.
The naive "send commit to all" fails the moment any participant can't commit. The naive "ask everyone first, then send commit" fails when the coordinator dies between phases.
2PC tries to solve this. It almost does.
2PC in detail
Two roles: coordinator (orchestrator) and participants (the resource managers).
Phase 1: PREPARE
- Coordinator writes "begin commit" to its log.
- Sends PREPARE message to each participant.
- Each participant: checks if it can commit (constraints, locks, etc). If yes, writes PREPARED to its log durably. Replies YES. If no, writes ABORT, replies NO.
After PREPARED is logged, the participant is "in doubt." It cannot abort unilaterally. It must wait for the coordinator's decision.
Phase 2: COMMIT or ABORT
- Coordinator examines votes. If all YES, writes COMMIT to log. Else writes ABORT.
- Sends decision to all participants.
- Each participant applies the decision, writes COMMITTED or ABORTED to log, releases locks, sends ack.
- Coordinator writes "end of transaction" once all acks received.
The crucial invariant: after a participant logs PREPARED and the coordinator logs COMMIT, the transaction WILL commit, even if everyone crashes. Recovery code replays the logs.
Why 2PC blocks
The killer scenario: coordinator dies right after sending PREPARE and getting all YES votes, but before sending COMMIT.
Participants are in doubt. They have logged PREPARED. They hold locks. They cannot:
- Commit: maybe the coordinator decided abort.
- Abort: maybe the coordinator decided commit.
- Ask each other: they don't know the coordinator's decision either.
They must wait for the coordinator to recover and tell them. Locks remain held. Throughput stalls.
This is fundamental to 2PC. Provably, no protocol with only 2 message rounds can avoid this blocking in the presence of coordinator failure.
Recovery
When the coordinator restarts, it reads its log:
- If COMMIT was logged, send COMMIT to all participants again.
- If COMMIT not logged but PREPARE was, this is where it gets weird. Some participants might have voted YES, some NO, but coordinator never decided. Coordinator must contact participants to find out the votes and then decide.
When a participant restarts:
- If PREPARED is logged but no decision: it is in doubt. Must contact coordinator.
- If COMMITTED or ABORTED is logged: it knows the outcome, can recover state.
The in-doubt state is the hell of 2PC. Operators run scripts to manually resolve doubted transactions. XA has tools for this. It is a known pain.
3PC and its limits
Three-phase commit adds a pre-commit phase between PREPARE and COMMIT:
- Coordinator sends PREPARE. Participants vote.
- If all YES, coordinator sends PRE-COMMIT. Participants ack.
- Once all acked PRE-COMMIT, coordinator sends COMMIT.
The extra round lets participants know that everyone is ready, so if the coordinator dies after PRE-COMMIT, participants can decide on their own (everyone in PRE-COMMIT state will commit).
3PC handles coordinator failure better. But it does NOT handle network partitions. Under partition, you can still get split decisions.
Also adds a round of latency. In practice, 3PC is rarely used. Either you accept 2PC's blocking nature or you use consensus-based protocols (Paxos Commit, Raft-based transactions).
XA transactions
XA is the standard for distributed transactions across resource managers (databases, message queues, etc). It is 2PC with a defined API:
- xa_start: begin a transaction branch.
- xa_end: end the branch.
- xa_prepare: prepare phase.
- xa_commit / xa_rollback: phase 2.
- xa_recover: list in-doubt transactions.
Postgres supports XA. Many JEE app servers do. Used in classic enterprise systems.
Issues:
- All resources must support XA. Many modern databases (Mongo before 4, most NoSQL, message queues like Kafka) don't.
- Performance is much worse than local transactions.
- The in-doubt problem is real and operationally painful.
- Doesn't compose with eventually consistent systems.
XA is fading. Modern systems use sagas or event-driven patterns.
Sagas
Garcia-Molina and Salem proposed sagas in 1987 as an alternative to long-running transactions. A saga is a sequence of local transactions T1, T2, ..., Tn, each with a compensating transaction C1, C2, ..., Cn-1.
If all Ti succeed, the saga is complete. If Tk fails, execute Ck-1, Ck-2, ..., C1 to compensate.
Two flavors:
- Choreography: each service emits events, others react. Decentralized.
- Orchestration: a coordinator drives the saga, sending commands to each service.
Example: book trip saga.
- T1: reserve flight. C1: cancel flight.
- T2: reserve hotel. C2: cancel hotel.
- T3: charge card. C3: refund card.
If T3 fails, run C2 then C1.
Saga properties
- No global atomicity. Intermediate states are visible to others.
- No isolation. Another transaction can see partial state.
- Compensation must be commutative and idempotent.
- Some operations cannot be compensated (sending email, calling external API). Use Pivot transaction pattern: do uncompensable steps last, after all reversible steps succeed.
When compensation is hard
- Sending notifications. Can't unsend an email. Use "intent to send" + delay.
- External API calls. Often no undo. Build idempotency keys, retry instead.
- Real-world actions (shipping a package). Hard or impossible to reverse.
The pivot point in a saga is the last reversible step. Place irreversible steps after it.
Outbox pattern
The pragmatic solution for "I need to update my database AND publish an event atomically."
Naive: write to DB, then publish to Kafka. If publish fails, you've updated DB but not notified anyone. Inconsistent.
Outbox:
- In the same database transaction: write the business change AND insert a row in an "outbox" table.
- A separate process polls the outbox table and publishes to Kafka.
- After publishing, mark the outbox row as published.
The local transaction is atomic. The publish is at-least-once. Consumers must be idempotent.
This is the modern alternative to XA for "DB + queue" atomicity. Used by Stripe, Shopify, basically every event-driven SaaS.
CDC (change data capture) tools like Debezium can read the outbox automatically via the transaction log, no polling needed.
Spanner: distributed transactions done right
Spanner uses Paxos-coordinated 2PC. Each shard is a Paxos group. Cross-shard transactions use 2PC where the coordinator is itself a Paxos group, so coordinator failure is handled by Paxos failover instead of blocking.
This gives:
- Atomic distributed transactions.
- No coordinator SPOF (Paxos handles failover).
- TrueTime for external consistency.
Cost: latency. Each transaction pays cross-region RTT for Paxos round and 2PC round. Spanner trades latency for correctness.
CockroachDB does the same idea with Raft groups per range plus 2PC across ranges.
This is "2PC done right": consensus replaces single-coordinator design. The blocking problem is solved by making the coordinator itself fault-tolerant.
When to use what
| Scenario | Solution |
|---|---|
| Single-database transaction | Local ACID |
| Multiple databases in same enterprise | XA (with operational pain) or sagas |
| Microservices, cross-service consistency | Sagas |
| DB write + event publish | Outbox pattern |
| Globally distributed strong consistency | Spanner / CockroachDB style |
| Eventually consistent OK | Event-driven + compensation |
For 99% of microservice architectures: sagas + outbox pattern. Accept eventual consistency, design for compensation, make everything idempotent.
What to read next
- DDIA chapter 9 for the formal treatment.
- Sagas paper for the original framework.
- Chris Richardson's microservices.io for production patterns (sagas, outbox).
- Spanner paper for how cross-shard transactions work at Google scale.
Learn more
- ArticleDesigning Data-Intensive Applications, Chapter 9Martin Kleppmann
- PaperSagas paperGarcia-Molina and Salem
- DocsPattern: SagaChris Richardson
- DocsTransactional OutboxChris Richardson
- Paper