Deep dive15 min read← Back to crisp

Transactional outbox pattern

How to atomically link a DB write to an event publish, why dual-writes always break, polling vs CDC, and the inbox companion pattern.

The dual-write problem is the most common silent data corruption bug in distributed systems. You have a database. You have a message bus. You need both to know about an event. You cannot do an atomic transaction across them. Naive code drops events when the process dies between writes. The outbox pattern solves this with one extra table.

The problem in detail

Consider order placement:

def place_order(order):
    db.insert("orders", order)
    payment.charge(order.customer, order.total)
    kafka.publish("order.placed", order)
    email.send(order.customer, "Order confirmation")

Four side effects, four chances for inconsistency:

DB succeeds, payment fails. Order in DB never paid for. Need rollback.
DB succeeds, payment succeeds, Kafka fails. Order paid but downstream services blind. Customer charged, no fulfillment.
All three succeed, email fails. Customer paid, fulfillment happening, but they don't know. Support tickets.
Process crashes between any of these. Anything could be in any state.

Even if you wrap them in a try/catch, you can't recover from a crash. The OS doesn't roll back DB writes when a Python process is killed.

The 2PC alternative (don't)

Two-phase commit can theoretically coordinate the DB and the message bus. In practice:

Most message buses don't support 2PC (Kafka, SQS, Pub/Sub all do not).
2PC blocks if a participant dies during commit. Liveness disaster.
2PC is slow.

2PC is not the answer. The outbox is.

The outbox pattern, step by step

Step 1: outbox table

CREATE TABLE outbox (
    id BIGSERIAL PRIMARY KEY,
    aggregate_type TEXT NOT NULL,
    aggregate_id TEXT NOT NULL,
    event_type TEXT NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now(),
    published_at TIMESTAMPTZ
);
 
CREATE INDEX idx_outbox_unpublished ON outbox (created_at) WHERE published_at IS NULL;

A simple table with the event content. Partial index on unpublished rows for fast polling.

Step 2: write business data and event atomically

def place_order(order):
    with db.transaction():
        db.insert("orders", order)
        db.insert("outbox",
            aggregate_type="order",
            aggregate_id=order.id,
            event_type="order.placed",
            payload=serialize(order)
        )

Both succeed or both fail. There is no in-between.

Step 3: worker publishes

def outbox_worker():
    while True:
        with db.transaction():
            rows = db.execute("""
                SELECT * FROM outbox
                WHERE published_at IS NULL
                ORDER BY created_at
                LIMIT 100
                FOR UPDATE SKIP LOCKED
            """)
            for row in rows:
                try:
                    kafka.publish(row.event_type, row.payload)
                    db.execute("UPDATE outbox SET published_at = now() WHERE id = $1", row.id)
                except KafkaError:
                    # leave unpublished, retry next iteration
                    break
        time.sleep(0.1)

FOR UPDATE SKIP LOCKED lets multiple workers run in parallel without stepping on each other. Each worker grabs a batch, processes, commits.

Step 4: at-least-once handling

The worker might publish then die before updating published_at. Next run republishes. So consumers must be idempotent. The event ID (the outbox row ID, or a UUID in the payload) is the dedup key.

This is the same at-least-once + idempotent consumer pattern as everywhere else in distributed systems. The outbox doesn't change it; it just enables it.

Atomic write in one transaction, async publish in another

Polling vs CDC

Polling

Worker queries the outbox at a fixed interval. Polling latency = your interval.

Pros:

Simple. Just a worker process.
No extra infrastructure.
Easy to reason about.

Cons:

Latency is poll interval (typically 100ms-1s).
DB load proportional to polling frequency.

CDC (Debezium)

Debezium reads the Postgres WAL, detects inserts to the outbox table, publishes directly to Kafka.

Pros:

Sub-second latency.
No polling load on the DB.
Scales to high throughput.

Cons:

Debezium is a real piece of infrastructure to operate.
Kafka Connect adds complexity.
Debugging WAL-based pipelines is harder.

Decision: start with polling. Add CDC when polling latency or load is unacceptable. For most systems polling is fine forever.

Cleanup

The outbox table grows forever if you don't clean it up. After publishing, you can:

Delete published rows. Simplest. After published_at is set, delete.
Keep for audit. Retain for N days for debugging. Cron job deletes older rows.
Move to cold storage. S3 dump for long-term audit.

The deletion can be in the same transaction as the publish mark:

DELETE FROM outbox WHERE id = $1
-- or
UPDATE outbox SET published_at = now() WHERE id = $1

Delete is faster but loses history. Update preserves history but grows the table. I prefer update + nightly archive.

The inbox companion pattern

The outbox handles the producer side. The inbox handles the consumer side.

When a consumer receives a message, before doing the work it inserts the message ID into an inbox table. Same transaction as the business effect.

def consume(msg):
    with db.transaction():
        try:
            db.insert("inbox", message_id=msg.id)
        except UniqueViolation:
            return  # already processed
        do_business_logic(msg)

If the consumer crashes after the side effect but before acking the message broker, the broker redelivers. The inbox insert fails on duplicate. Skip.

This is idempotency at the consumer side. Outbox + inbox = end-to-end exactly-once semantics from the application's perspective, even though the underlying bus is at-least-once.

Failure modes

Worker not running

The most common production issue. Worker process dies, nobody notices, outbox table grows, downstream goes blind.

Mitigation:

Run the worker as a critical service with health checks and autorestart.
Alert on count(*) where published_at is null > threshold.
Alert on max(now() - created_at) where published_at is null > threshold (oldest unpublished message age).

Multiple workers double-publishing

If you forget FOR UPDATE SKIP LOCKED and run two workers, both grab the same row and publish twice. Even with the lock, if you set the published_at flag without serializability, you might race.

Always use FOR UPDATE SKIP LOCKED. Test by running multiple workers.

Outbox grows huge during downstream outage

Kafka goes down for 4 hours. Outbox accumulates 4 hours of writes. When Kafka returns, the worker has a backlog.

Mitigation:

Worker batches large numbers of events efficiently.
Don't block app writes when outbox is large; app just keeps inserting.
After recovery, monitor that the backlog clears within an SLA.

Payload schema change

You change the order schema. Old outbox rows have the old schema. Worker publishes them. Consumers see the old schema.

Mitigation:

Include schema version in the payload.
Drain the outbox before deploying schema changes.
Use schema registry-compatible formats.

Why outbox > direct Kafka write

Some teams ask: why not just publish to Kafka first, then write to DB? The DB write fails sometimes but rarely.

Problems:

"Rarely" is not "never." In your career, you will hit it.
The Kafka publish might also fail, leaving you in the same dual-write problem reversed.
Order is wrong: you've told the world "order created" before the order is actually saved.

Outbox commits the truth (DB) first, then propagates. It's the right direction of causality.

Outbox vs event sourcing

Often confused. They are different patterns.

Outbox. Optional events alongside traditional state-based DB writes. The DB is the source of truth; events are notifications.

Event sourcing. The event log is the source of truth. State is rebuilt by replaying events. See the event sourcing section.

You can add outbox to an existing system in a week. Event sourcing is a fundamental architectural choice.

Polling worker code, production-ready

import time
from contextlib import contextmanager
 
BATCH = 100
POLL_INTERVAL = 0.1
MAX_RETRIES = 3
 
def outbox_worker():
    while True:
        try:
            published = process_batch()
            if published == 0:
                time.sleep(POLL_INTERVAL)
        except Exception as e:
            log.error("worker error", error=e)
            time.sleep(1)
 
def process_batch():
    with db.transaction():
        rows = db.execute(
            "SELECT id, event_type, payload, attempts "
            "FROM outbox "
            "WHERE published_at IS NULL AND attempts < $1 "
            "ORDER BY created_at LIMIT $2 "
            "FOR UPDATE SKIP LOCKED",
            MAX_RETRIES, BATCH
        )
        if not rows:
            return 0
 
        for row in rows:
            try:
                kafka.publish(row.event_type, row.payload, key=row.id)
                db.execute(
                    "UPDATE outbox SET published_at = now() WHERE id = $1",
                    row.id
                )
            except KafkaError:
                db.execute(
                    "UPDATE outbox SET attempts = attempts + 1 WHERE id = $1",
                    row.id
                )
        return len(rows)

Real production code adds: tracing context, metrics emission, structured logging, graceful shutdown.

Operational checklist

What I would tell a junior engineer

The outbox is the cheapest insurance you will buy. One table, one worker. It eliminates an entire category of "events lost in production" bugs that are impossible to reproduce locally. Every time I have not used it, I have regretted it. Every time I have used it, I have forgotten about it because it just works.

Learn more

Article
Microservices.io: Transactional OutboxChris Richardson
Docs
Debezium: outbox event routerDebezium docs
Docs
Microsoft: Transactional OutboxMicrosoft Learn
Article
Designing Data-Intensive Applications, ch 11Martin Kleppmann
Article
Confluent: outbox pattern with KafkaConfluent blog