Deep dive15 min read← Back to crisp

Two-phase commit and limits

2PC mechanics, why it blocks, 3PC and its issues, sagas, the outbox pattern, and how production systems actually handle cross-service atomicity.

The atomic commit problem

Multiple participants must agree to either all commit or all abort a distributed transaction. Sounds simple. It is hard:

Participants can fail independently.
Messages can be lost.
Coordinator can fail.
Decisions must be durable.

The naive "send commit to all" fails the moment any participant can't commit. The naive "ask everyone first, then send commit" fails when the coordinator dies between phases.

2PC tries to solve this. It almost does.

2PC in detail

Two roles: coordinator (orchestrator) and participants (the resource managers).

Phase 1: PREPARE

Coordinator writes "begin commit" to its log.
Sends PREPARE message to each participant.
Each participant: checks if it can commit (constraints, locks, etc). If yes, writes PREPARED to its log durably. Replies YES. If no, writes ABORT, replies NO.

After PREPARED is logged, the participant is "in doubt." It cannot abort unilaterally. It must wait for the coordinator's decision.

Phase 2: COMMIT or ABORT

Coordinator examines votes. If all YES, writes COMMIT to log. Else writes ABORT.
Sends decision to all participants.
Each participant applies the decision, writes COMMITTED or ABORTED to log, releases locks, sends ack.
Coordinator writes "end of transaction" once all acks received.

The crucial invariant: after a participant logs PREPARED and the coordinator logs COMMIT, the transaction WILL commit, even if everyone crashes. Recovery code replays the logs.

2PC with durable logging at each step

Why 2PC blocks

The killer scenario: coordinator dies right after sending PREPARE and getting all YES votes, but before sending COMMIT.

Participants are in doubt. They have logged PREPARED. They hold locks. They cannot:

Commit: maybe the coordinator decided abort.
Abort: maybe the coordinator decided commit.
Ask each other: they don't know the coordinator's decision either.

They must wait for the coordinator to recover and tell them. Locks remain held. Throughput stalls.

This is fundamental to 2PC. Provably, no protocol with only 2 message rounds can avoid this blocking in the presence of coordinator failure.

Recovery

When the coordinator restarts, it reads its log:

If COMMIT was logged, send COMMIT to all participants again.
If COMMIT not logged but PREPARE was, this is where it gets weird. Some participants might have voted YES, some NO, but coordinator never decided. Coordinator must contact participants to find out the votes and then decide.

When a participant restarts:

If PREPARED is logged but no decision: it is in doubt. Must contact coordinator.
If COMMITTED or ABORTED is logged: it knows the outcome, can recover state.

The in-doubt state is the hell of 2PC. Operators run scripts to manually resolve doubted transactions. XA has tools for this. It is a known pain.

3PC and its limits

Three-phase commit adds a pre-commit phase between PREPARE and COMMIT:

Coordinator sends PREPARE. Participants vote.
If all YES, coordinator sends PRE-COMMIT. Participants ack.
Once all acked PRE-COMMIT, coordinator sends COMMIT.

The extra round lets participants know that everyone is ready, so if the coordinator dies after PRE-COMMIT, participants can decide on their own (everyone in PRE-COMMIT state will commit).

3PC handles coordinator failure better. But it does NOT handle network partitions. Under partition, you can still get split decisions.

Also adds a round of latency. In practice, 3PC is rarely used. Either you accept 2PC's blocking nature or you use consensus-based protocols (Paxos Commit, Raft-based transactions).

XA transactions

XA is the standard for distributed transactions across resource managers (databases, message queues, etc). It is 2PC with a defined API:

xa_start: begin a transaction branch.
xa_end: end the branch.
xa_prepare: prepare phase.
xa_commit / xa_rollback: phase 2.
xa_recover: list in-doubt transactions.

Postgres supports XA. Many JEE app servers do. Used in classic enterprise systems.

Issues:

All resources must support XA. Many modern databases (Mongo before 4, most NoSQL, message queues like Kafka) don't.
Performance is much worse than local transactions.
The in-doubt problem is real and operationally painful.
Doesn't compose with eventually consistent systems.

XA is fading. Modern systems use sagas or event-driven patterns.

Garcia-Molina and Salem proposed sagas in 1987 as an alternative to long-running transactions. A saga is a sequence of local transactions T1, T2, ..., Tn, each with a compensating transaction C1, C2, ..., Cn-1.

If all Ti succeed, the saga is complete. If Tk fails, execute Ck-1, Ck-2, ..., C1 to compensate.

Two flavors:

Choreography: each service emits events, others react. Decentralized.
Orchestration: a coordinator drives the saga, sending commands to each service.

Example: book trip saga.

T1: reserve flight. C1: cancel flight.
T2: reserve hotel. C2: cancel hotel.
T3: charge card. C3: refund card.

If T3 fails, run C2 then C1.

Saga with compensation on payment failure

Saga properties

No global atomicity. Intermediate states are visible to others.
No isolation. Another transaction can see partial state.
Compensation must be commutative and idempotent.
Some operations cannot be compensated (sending email, calling external API). Use Pivot transaction pattern: do uncompensable steps last, after all reversible steps succeed.

When compensation is hard

Sending notifications. Can't unsend an email. Use "intent to send" + delay.
External API calls. Often no undo. Build idempotency keys, retry instead.
Real-world actions (shipping a package). Hard or impossible to reverse.

The pivot point in a saga is the last reversible step. Place irreversible steps after it.

Outbox pattern

The pragmatic solution for "I need to update my database AND publish an event atomically."

Naive: write to DB, then publish to Kafka. If publish fails, you've updated DB but not notified anyone. Inconsistent.

Outbox:

In the same database transaction: write the business change AND insert a row in an "outbox" table.
A separate process polls the outbox table and publishes to Kafka.
After publishing, mark the outbox row as published.

The local transaction is atomic. The publish is at-least-once. Consumers must be idempotent.

This is the modern alternative to XA for "DB + queue" atomicity. Used by Stripe, Shopify, basically every event-driven SaaS.

CDC (change data capture) tools like Debezium can read the outbox automatically via the transaction log, no polling needed.

Spanner: distributed transactions done right

Spanner uses Paxos-coordinated 2PC. Each shard is a Paxos group. Cross-shard transactions use 2PC where the coordinator is itself a Paxos group, so coordinator failure is handled by Paxos failover instead of blocking.

This gives:

Atomic distributed transactions.
No coordinator SPOF (Paxos handles failover).
TrueTime for external consistency.

Cost: latency. Each transaction pays cross-region RTT for Paxos round and 2PC round. Spanner trades latency for correctness.

CockroachDB does the same idea with Raft groups per range plus 2PC across ranges.

This is "2PC done right": consensus replaces single-coordinator design. The blocking problem is solved by making the coordinator itself fault-tolerant.

When to use what

Scenario	Solution
Single-database transaction	Local ACID
Multiple databases in same enterprise	XA (with operational pain) or sagas
Microservices, cross-service consistency	Sagas
DB write + event publish	Outbox pattern
Globally distributed strong consistency	Spanner / CockroachDB style
Eventually consistent OK	Event-driven + compensation

For 99% of microservice architectures: sagas + outbox pattern. Accept eventual consistency, design for compensation, make everything idempotent.

Learn more

Article
Designing Data-Intensive Applications, Chapter 9Martin Kleppmann
Paper
Sagas paperGarcia-Molina and Salem
Docs
Pattern: SagaChris Richardson
Docs
Transactional OutboxChris Richardson
Paper
Spanner read-write transactionsGoogle

Deep dive15 min read← Back to crisp

Two-phase commit and limits

2PC mechanics, why it blocks, 3PC and its issues, sagas, the outbox pattern, and how production systems actually handle cross-service atomicity.

The atomic commit problem

Multiple participants must agree to either all commit or all abort a distributed transaction. Sounds simple. It is hard:

Participants can fail independently.
Messages can be lost.
Coordinator can fail.
Decisions must be durable.

The naive "send commit to all" fails the moment any participant can't commit. The naive "ask everyone first, then send commit" fails when the coordinator dies between phases.

2PC tries to solve this. It almost does.

2PC in detail

Two roles: coordinator (orchestrator) and participants (the resource managers).

Phase 1: PREPARE

Coordinator writes "begin commit" to its log.
Sends PREPARE message to each participant.
Each participant: checks if it can commit (constraints, locks, etc). If yes, writes PREPARED to its log durably. Replies YES. If no, writes ABORT, replies NO.

After PREPARED is logged, the participant is "in doubt." It cannot abort unilaterally. It must wait for the coordinator's decision.

Phase 2: COMMIT or ABORT

Coordinator examines votes. If all YES, writes COMMIT to log. Else writes ABORT.
Sends decision to all participants.
Each participant applies the decision, writes COMMITTED or ABORTED to log, releases locks, sends ack.
Coordinator writes "end of transaction" once all acks received.

The crucial invariant: after a participant logs PREPARED and the coordinator logs COMMIT, the transaction WILL commit, even if everyone crashes. Recovery code replays the logs.

2PC with durable logging at each step

Why 2PC blocks

The killer scenario: coordinator dies right after sending PREPARE and getting all YES votes, but before sending COMMIT.

Participants are in doubt. They have logged PREPARED. They hold locks. They cannot:

Commit: maybe the coordinator decided abort.
Abort: maybe the coordinator decided commit.
Ask each other: they don't know the coordinator's decision either.

They must wait for the coordinator to recover and tell them. Locks remain held. Throughput stalls.

This is fundamental to 2PC. Provably, no protocol with only 2 message rounds can avoid this blocking in the presence of coordinator failure.

Recovery

When the coordinator restarts, it reads its log:

If COMMIT was logged, send COMMIT to all participants again.
If COMMIT not logged but PREPARE was, this is where it gets weird. Some participants might have voted YES, some NO, but coordinator never decided. Coordinator must contact participants to find out the votes and then decide.

When a participant restarts:

If PREPARED is logged but no decision: it is in doubt. Must contact coordinator.
If COMMITTED or ABORTED is logged: it knows the outcome, can recover state.

The in-doubt state is the hell of 2PC. Operators run scripts to manually resolve doubted transactions. XA has tools for this. It is a known pain.

3PC and its limits

Three-phase commit adds a pre-commit phase between PREPARE and COMMIT:

Coordinator sends PREPARE. Participants vote.
If all YES, coordinator sends PRE-COMMIT. Participants ack.
Once all acked PRE-COMMIT, coordinator sends COMMIT.

The extra round lets participants know that everyone is ready, so if the coordinator dies after PRE-COMMIT, participants can decide on their own (everyone in PRE-COMMIT state will commit).

3PC handles coordinator failure better. But it does NOT handle network partitions. Under partition, you can still get split decisions.

Also adds a round of latency. In practice, 3PC is rarely used. Either you accept 2PC's blocking nature or you use consensus-based protocols (Paxos Commit, Raft-based transactions).

XA transactions

XA is the standard for distributed transactions across resource managers (databases, message queues, etc). It is 2PC with a defined API:

xa_start: begin a transaction branch.
xa_end: end the branch.
xa_prepare: prepare phase.
xa_commit / xa_rollback: phase 2.
xa_recover: list in-doubt transactions.

Postgres supports XA. Many JEE app servers do. Used in classic enterprise systems.

Issues:

All resources must support XA. Many modern databases (Mongo before 4, most NoSQL, message queues like Kafka) don't.
Performance is much worse than local transactions.
The in-doubt problem is real and operationally painful.
Doesn't compose with eventually consistent systems.

XA is fading. Modern systems use sagas or event-driven patterns.

Sagas

If all Ti succeed, the saga is complete. If Tk fails, execute Ck-1, Ck-2, ..., C1 to compensate.

Two flavors:

Choreography: each service emits events, others react. Decentralized.
Orchestration: a coordinator drives the saga, sending commands to each service.

Example: book trip saga.

T1: reserve flight. C1: cancel flight.
T2: reserve hotel. C2: cancel hotel.
T3: charge card. C3: refund card.

If T3 fails, run C2 then C1.

Saga with compensation on payment failure

Saga properties

No global atomicity. Intermediate states are visible to others.
No isolation. Another transaction can see partial state.
Compensation must be commutative and idempotent.
Some operations cannot be compensated (sending email, calling external API). Use Pivot transaction pattern: do uncompensable steps last, after all reversible steps succeed.

When compensation is hard

Sending notifications. Can't unsend an email. Use "intent to send" + delay.
External API calls. Often no undo. Build idempotency keys, retry instead.
Real-world actions (shipping a package). Hard or impossible to reverse.

The pivot point in a saga is the last reversible step. Place irreversible steps after it.

Outbox pattern

The pragmatic solution for "I need to update my database AND publish an event atomically."

Naive: write to DB, then publish to Kafka. If publish fails, you've updated DB but not notified anyone. Inconsistent.

Outbox:

In the same database transaction: write the business change AND insert a row in an "outbox" table.
A separate process polls the outbox table and publishes to Kafka.
After publishing, mark the outbox row as published.

The local transaction is atomic. The publish is at-least-once. Consumers must be idempotent.

This is the modern alternative to XA for "DB + queue" atomicity. Used by Stripe, Shopify, basically every event-driven SaaS.

CDC (change data capture) tools like Debezium can read the outbox automatically via the transaction log, no polling needed.

Spanner: distributed transactions done right

This gives:

Atomic distributed transactions.
No coordinator SPOF (Paxos handles failover).
TrueTime for external consistency.

Cost: latency. Each transaction pays cross-region RTT for Paxos round and 2PC round. Spanner trades latency for correctness.

CockroachDB does the same idea with Raft groups per range plus 2PC across ranges.

This is "2PC done right": consensus replaces single-coordinator design. The blocking problem is solved by making the coordinator itself fault-tolerant.

When to use what

Scenario	Solution
Single-database transaction	Local ACID
Multiple databases in same enterprise	XA (with operational pain) or sagas
Microservices, cross-service consistency	Sagas
DB write + event publish	Outbox pattern
Globally distributed strong consistency	Spanner / CockroachDB style
Eventually consistent OK	Event-driven + compensation

For 99% of microservice architectures: sagas + outbox pattern. Accept eventual consistency, design for compensation, make everything idempotent.

Learn more

Article
Designing Data-Intensive Applications, Chapter 9Martin Kleppmann
Paper
Sagas paperGarcia-Molina and Salem
Docs
Pattern: SagaChris Richardson
Docs
Transactional OutboxChris Richardson
Paper
Spanner read-write transactionsGoogle

Two-phase commit and limits

The atomic commit problem

2PC in detail

Why 2PC blocks

Recovery

3PC and its limits

XA transactions

Sagas

Saga properties

When compensation is hard

Outbox pattern

Spanner: distributed transactions done right

When to use what

What to read next

Learn more

Two-phase commit and limits

The atomic commit problem

2PC in detail

Why 2PC blocks

Recovery

3PC and its limits

XA transactions

Sagas

Saga properties

When compensation is hard

Outbox pattern

Spanner: distributed transactions done right

When to use what

What to read next

Learn more