Transactional outbox pattern
How to atomically link a DB write to an event publish, why dual-writes always break, polling vs CDC, and the inbox companion pattern.
The dual-write problem is the most common silent data corruption bug in distributed systems. You have a database. You have a message bus. You need both to know about an event. You cannot do an atomic transaction across them. Naive code drops events when the process dies between writes. The outbox pattern solves this with one extra table.
The problem in detail
Consider order placement:
def place_order(order):
db.insert("orders", order)
payment.charge(order.customer, order.total)
kafka.publish("order.placed", order)
email.send(order.customer, "Order confirmation")Four side effects, four chances for inconsistency:
- DB succeeds, payment fails. Order in DB never paid for. Need rollback.
- DB succeeds, payment succeeds, Kafka fails. Order paid but downstream services blind. Customer charged, no fulfillment.
- All three succeed, email fails. Customer paid, fulfillment happening, but they don't know. Support tickets.
- Process crashes between any of these. Anything could be in any state.
Even if you wrap them in a try/catch, you can't recover from a crash. The OS doesn't roll back DB writes when a Python process is killed.
The 2PC alternative (don't)
Two-phase commit can theoretically coordinate the DB and the message bus. In practice:
- Most message buses don't support 2PC (Kafka, SQS, Pub/Sub all do not).
- 2PC blocks if a participant dies during commit. Liveness disaster.
- 2PC is slow.
2PC is not the answer. The outbox is.
The outbox pattern, step by step
Step 1: outbox table
CREATE TABLE outbox (
id BIGSERIAL PRIMARY KEY,
aggregate_type TEXT NOT NULL,
aggregate_id TEXT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT now(),
published_at TIMESTAMPTZ
);
CREATE INDEX idx_outbox_unpublished ON outbox (created_at) WHERE published_at IS NULL;A simple table with the event content. Partial index on unpublished rows for fast polling.
Step 2: write business data and event atomically
def place_order(order):
with db.transaction():
db.insert("orders", order)
db.insert("outbox",
aggregate_type="order",
aggregate_id=order.id,
event_type="order.placed",
payload=serialize(order)
)Both succeed or both fail. There is no in-between.
Step 3: worker publishes
def outbox_worker():
while True:
with db.transaction():
rows = db.execute("""
SELECT * FROM outbox
WHERE published_at IS NULL
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED
""")
for row in rows:
try:
kafka.publish(row.event_type, row.payload)
db.execute("UPDATE outbox SET published_at = now() WHERE id = $1", row.id)
except KafkaError:
# leave unpublished, retry next iteration
break
time.sleep(0.1)FOR UPDATE SKIP LOCKED lets multiple workers run in parallel without stepping on each other. Each worker grabs a batch, processes, commits.
Step 4: at-least-once handling
The worker might publish then die before updating published_at. Next run republishes. So consumers must be idempotent. The event ID (the outbox row ID, or a UUID in the payload) is the dedup key.
This is the same at-least-once + idempotent consumer pattern as everywhere else in distributed systems. The outbox doesn't change it; it just enables it.
Polling vs CDC
Polling
Worker queries the outbox at a fixed interval. Polling latency = your interval.
Pros:
- Simple. Just a worker process.
- No extra infrastructure.
- Easy to reason about.
Cons:
- Latency is poll interval (typically 100ms-1s).
- DB load proportional to polling frequency.
CDC (Debezium)
Debezium reads the Postgres WAL, detects inserts to the outbox table, publishes directly to Kafka.
Pros:
- Sub-second latency.
- No polling load on the DB.
- Scales to high throughput.
Cons:
- Debezium is a real piece of infrastructure to operate.
- Kafka Connect adds complexity.
- Debugging WAL-based pipelines is harder.
Decision: start with polling. Add CDC when polling latency or load is unacceptable. For most systems polling is fine forever.
Cleanup
The outbox table grows forever if you don't clean it up. After publishing, you can:
- Delete published rows. Simplest. After
published_atis set, delete. - Keep for audit. Retain for N days for debugging. Cron job deletes older rows.
- Move to cold storage. S3 dump for long-term audit.
The deletion can be in the same transaction as the publish mark:
DELETE FROM outbox WHERE id = $1
-- or
UPDATE outbox SET published_at = now() WHERE id = $1Delete is faster but loses history. Update preserves history but grows the table. I prefer update + nightly archive.
The inbox companion pattern
The outbox handles the producer side. The inbox handles the consumer side.
When a consumer receives a message, before doing the work it inserts the message ID into an inbox table. Same transaction as the business effect.
def consume(msg):
with db.transaction():
try:
db.insert("inbox", message_id=msg.id)
except UniqueViolation:
return # already processed
do_business_logic(msg)If the consumer crashes after the side effect but before acking the message broker, the broker redelivers. The inbox insert fails on duplicate. Skip.
This is idempotency at the consumer side. Outbox + inbox = end-to-end exactly-once semantics from the application's perspective, even though the underlying bus is at-least-once.
Failure modes
Worker not running
The most common production issue. Worker process dies, nobody notices, outbox table grows, downstream goes blind.
Mitigation:
- Run the worker as a critical service with health checks and autorestart.
- Alert on
count(*) where published_at is null> threshold. - Alert on
max(now() - created_at) where published_at is null> threshold (oldest unpublished message age).
Multiple workers double-publishing
If you forget FOR UPDATE SKIP LOCKED and run two workers, both grab the same row and publish twice. Even with the lock, if you set the published_at flag without serializability, you might race.
Always use FOR UPDATE SKIP LOCKED. Test by running multiple workers.
Outbox grows huge during downstream outage
Kafka goes down for 4 hours. Outbox accumulates 4 hours of writes. When Kafka returns, the worker has a backlog.
Mitigation:
- Worker batches large numbers of events efficiently.
- Don't block app writes when outbox is large; app just keeps inserting.
- After recovery, monitor that the backlog clears within an SLA.
Payload schema change
You change the order schema. Old outbox rows have the old schema. Worker publishes them. Consumers see the old schema.
Mitigation:
- Include schema version in the payload.
- Drain the outbox before deploying schema changes.
- Use schema registry-compatible formats.
Why outbox > direct Kafka write
Some teams ask: why not just publish to Kafka first, then write to DB? The DB write fails sometimes but rarely.
Problems:
- "Rarely" is not "never." In your career, you will hit it.
- The Kafka publish might also fail, leaving you in the same dual-write problem reversed.
- Order is wrong: you've told the world "order created" before the order is actually saved.
Outbox commits the truth (DB) first, then propagates. It's the right direction of causality.
Outbox vs event sourcing
Often confused. They are different patterns.
Outbox. Optional events alongside traditional state-based DB writes. The DB is the source of truth; events are notifications.
Event sourcing. The event log is the source of truth. State is rebuilt by replaying events. See the event sourcing section.
You can add outbox to an existing system in a week. Event sourcing is a fundamental architectural choice.
Polling worker code, production-ready
import time
from contextlib import contextmanager
BATCH = 100
POLL_INTERVAL = 0.1
MAX_RETRIES = 3
def outbox_worker():
while True:
try:
published = process_batch()
if published == 0:
time.sleep(POLL_INTERVAL)
except Exception as e:
log.error("worker error", error=e)
time.sleep(1)
def process_batch():
with db.transaction():
rows = db.execute(
"SELECT id, event_type, payload, attempts "
"FROM outbox "
"WHERE published_at IS NULL AND attempts < $1 "
"ORDER BY created_at LIMIT $2 "
"FOR UPDATE SKIP LOCKED",
MAX_RETRIES, BATCH
)
if not rows:
return 0
for row in rows:
try:
kafka.publish(row.event_type, row.payload, key=row.id)
db.execute(
"UPDATE outbox SET published_at = now() WHERE id = $1",
row.id
)
except KafkaError:
db.execute(
"UPDATE outbox SET attempts = attempts + 1 WHERE id = $1",
row.id
)
return len(rows)Real production code adds: tracing context, metrics emission, structured logging, graceful shutdown.
Operational checklist
- Outbox table with partial index on unpublished rows.
- App writes business data + outbox in single transaction.
- Worker process runs as critical service.
-
FOR UPDATE SKIP LOCKEDin worker query. - Alert on unpublished count and age.
- Retention policy on outbox table (delete or archive).
- Consumer dedup (inbox or idempotency keys).
- Schema versioning in payloads.
- Tested: kill worker mid-batch, confirm recovery.
- Tested: kill Kafka, confirm outbox absorbs the load.
What I would tell a junior engineer
The outbox is the cheapest insurance you will buy. One table, one worker. It eliminates an entire category of "events lost in production" bugs that are impossible to reproduce locally. Every time I have not used it, I have regretted it. Every time I have used it, I have forgotten about it because it just works.
Learn more
- ArticleMicroservices.io: Transactional OutboxChris Richardson
- DocsDebezium: outbox event routerDebezium docs
- DocsMicrosoft: Transactional OutboxMicrosoft Learn
- ArticleDesigning Data-Intensive Applications, ch 11Martin Kleppmann
- ArticleConfluent: outbox pattern with KafkaConfluent blog