Event sourcing

Aggregates, events, projections, snapshots, schema evolution, and an honest assessment of whether you should adopt it.

Event sourcing is the most powerful and most over-applied pattern in distributed systems. It can give you an indestructible audit log, time-travel debugging, and multiple read models from one source of truth. It can also drown a team that adopts it for the wrong reasons. This is the long version: what it actually is, what it costs, and when to use it.

The fundamental shift

Traditional systems store current state. The order has status "shipped." You don't know when it transitioned from "paid" to "shipped" unless you added that explicitly. You can't reconstruct the past unless you logged it.

Event sourcing flips this: state is derived from a sequence of immutable events. The events are the source of truth. Current state is a function of events.

state = reduce(events, apply, initial_state)

Same idea as Redux on the frontend, generalized to the whole backend.

Commands write events, projections build read models, queries read projections

The vocabulary

Aggregate. A consistency boundary. A single entity (or cluster) whose invariants are enforced together. For a bank: an Account. For e-commerce: an Order.

Command. A request to change state. "PlaceOrder", "DepositMoney." Commands have a single aggregate target.

Event. A fact about something that happened. "OrderPlaced", "MoneyDeposited." Events are immutable. Past tense. They cannot be deleted or modified.

Event store. The database for events. Append-only log indexed by aggregate ID.

Projection / read model. A derived view built by consuming events. Different projections optimize for different queries.

CQRS. Command Query Responsibility Segregation. Separate write model (aggregates + events) from read models (projections). Almost always used with event sourcing.

A worked example: bank account

class Account:
    def __init__(self, events):
        self.balance = 0
        self.is_closed = False
        for event in events:
            self.apply(event)
 
    def apply(self, event):
        match event:
            case Opened(): self.balance = 0
            case Deposited(amount): self.balance += amount
            case Withdrew(amount): self.balance -= amount
            case Closed(): self.is_closed = True
 
    def deposit(self, amount):
        if self.is_closed:
            raise ClosedAccount()
        return [Deposited(amount=amount)]
 
    def withdraw(self, amount):
        if self.is_closed:
            raise ClosedAccount()
        if amount > self.balance:
            raise InsufficientFunds()
        return [Withdrew(amount=amount)]

Notice:

The aggregate is rebuilt from events on each load.
Commands (deposit, withdraw) return new events, don't mutate state directly.
Invariants (positive balance, not closed) are checked using current state, but the result is new events.

The repository handles persistence:

def handle_deposit(account_id, amount):
    events = event_store.load(account_id)
    account = Account(events)
    new_events = account.deposit(amount)
    event_store.append(account_id, new_events, expected_version=len(events))

expected_version is optimistic concurrency. If another process appended between load and append, the version doesn't match, the append fails, and you retry.

Projections

The event store is bad for queries. Imagine "list all accounts with balance > $10,000." You'd need to replay every account's events. Useless.

Projections build query-optimized read models. A projector subscribes to the event stream and updates a database.

def account_balance_projector(event):
    match event:
        case Opened(account_id):
            db.insert("account_balances", id=account_id, balance=0)
        case Deposited(account_id, amount):
            db.execute("UPDATE account_balances SET balance = balance + $1 WHERE id = $2", amount, account_id)
        case Withdrew(account_id, amount):
            db.execute("UPDATE account_balances SET balance = balance - $1 WHERE id = $2", amount, account_id)

account_balances is the read model. Queries hit it, not the event store. Multiple projections can build different read models for different queries.

A "wide" projection joins multiple aggregates. A "narrow" projection is one denormalized table per query.

Eventual consistency

Projections lag the event store. After a command succeeds, the read model may not yet reflect it.

UI implication: if the user clicks "deposit $100" and is redirected to "view balance," the new balance might not show. Options:

Wait. Have the UI poll until the projection catches up.
Optimistic UI. Show the expected new value, refresh later.
Read your own writes. After a command, the user is given a "version" they should see. Queries wait until that version is in the projection.

Stripe's API solves this by including expanded objects in the response. After POST /charges, the response includes the charge object as it just was. No need to query for it.

Plan for eventual consistency from day one. It is not a bolt-on.

Snapshots

Replaying 10 million events to compute current state is too slow. Snapshots are checkpoints:

Every N events (e.g., 100 or 1000), store the current aggregate state.
To load an aggregate: find the latest snapshot, load only events after it, apply them.

def load_account(account_id):
    snapshot = event_store.latest_snapshot(account_id)
    events = event_store.events_after(account_id, snapshot.version)
    return Account.from_snapshot(snapshot).apply_all(events)

Snapshots are an optimization, not a source of truth. The events remain authoritative. A snapshot can always be deleted and rebuilt.

Schema evolution: the hardest part

You rename a field. In a traditional system, you write a DB migration. Done.

In event sourcing, you have a million old events with the old field name. They are immutable. They will be replayed when a new projector starts. The projector must understand both versions forever.

Strategies:

Versioned events. Each event type has a version. OrderPlaced_v2 is a new type. Old projectors handle v1, new ones handle both.

Upcasting. A transformation layer reads old events and converts them to the new format before passing to aggregates/projections. Code grows over time.

Snapshotting + migration. Rebuild the snapshot, then mark old events as historical. Doesn't reduce the schema problem; just hides it.

Migrating the event store (heretical). Some teams rewrite old events. Greg Young thinks this is sin. In practice, sometimes you do it for a regulatory cleanup. Document why.

Greg Young wrote a whole book on event versioning. It is real work. Plan for it.

Storage

Where do you store events?

Postgres. An events table with aggregate_id, version, type, payload. Works up to medium scale.

CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    aggregate_type TEXT NOT NULL,
    aggregate_id UUID NOT NULL,
    version INT NOT NULL,
    event_type TEXT NOT NULL,
    payload JSONB NOT NULL,
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT now(),
    UNIQUE (aggregate_id, version)
);

The unique constraint enforces optimistic concurrency. The (aggregate_id, version) combination must be unique; if two processes try to write version 5, one fails.

EventStore DB. Purpose-built event store. Excellent for serious event sourcing.

Kafka. Some teams use Kafka as the event store. Pros: natural fit for streaming. Cons: querying is hard (Kafka is not a database), aggregate loading requires reading the whole topic from start.

DynamoDB. Append-only with strong per-partition consistency. AWS shops sometimes use this.

For most teams: Postgres until proven inadequate.

CQRS in detail

Command side:

Receive command.
Load aggregate from events.
Validate, generate new events.
Append to event store.
Done.

Query side:

Hit the read model.
Return data.

The command side and query side use different schemas, different databases, different scaling profiles. The command side is write-optimized. The query side is read-optimized.

You can have many read models for one command model. Each is denormalized for its specific queries.

When event sourcing wins

Audit-heavy domains. Finance, healthcare, legal. The event log is the regulatory record.
Time-travel queries. "What did our inventory look like on March 15?" Replay events to that point.
Multiple read models. Same domain consumed by different users (analytics, support, mobile, web). Each gets its own projection.
Bug recovery. A bug processed events incorrectly. Fix the bug, replay the events, rebuild the projection.
A/B testing data pipelines. Run new pipeline on historical events.

When event sourcing loses

Simple CRUD. Blog, todo list, basic admin panel. Pure overhead.
Schema thrash. Early-stage product where the domain model changes weekly. Schema evolution will kill you.
Team doesn't know it. The learning curve is steep. Without someone who has done it before, you'll make all the rookie mistakes.
You're avoiding traditional DBs. "Postgres is too constraining" is rarely a real reason. Usually it's resume-driven.
The audit requirement is fake. "We might need history someday" is not a requirement. Adding a log table is cheaper.

Operating an event-sourced system

What's different about running this in prod:

Backups are huge. Events grow forever (unless you snapshot + archive, which has its own complications).
Rebuilds are slow. Rebuilding a projection from scratch can take hours. Plan for it.
Onboarding is harder. New engineers need to learn the model.
Tooling is sparse. Debuggers, query tools, and ORMs assume current-state databases. You write more custom tooling.
Observability is different. Standard APM tools work; but understanding "why is the projection wrong" requires inspecting the event stream, not just the DB.

CQRS without event sourcing

A common middle ground. Use CQRS (separate read and write models) without event sourcing.

Write side: traditional schema, simple updates.
Read side: denormalized views, updated by an outbox or CDC pipeline.

You get most of the read-model benefits without the event store complexity. This is what most "we adopted CQRS" teams actually do.

Migration paths

You won't event-source greenfield. You'll have a CRUD system that needs event sourcing for one bounded context.

Strategy:

Identify one aggregate that needs event sourcing.
Build the event store, command handlers, projections for that aggregate alone.
The projections look exactly like the existing tables. Other services don't notice.
Cut over reads and writes for that aggregate to the event-sourced version.
Validate. Decide whether to event-source the next aggregate.

Never event-source the whole system at once.

What I would tell a junior engineer

Event sourcing is real engineering. It's not a buzzword. Used in the right domain, it gives you superpowers: time travel, audit, multi-view, debugging in prod. Used in the wrong domain, it gives you 3x the code for the same functionality plus a lifetime of schema evolution work.

Read Greg Young's talk. Read his versioning book. Build a toy banking system with event sourcing in a weekend. Then decide if your real domain actually needs it. If it doesn't, use Postgres with a good audit log and call it a day.

Learn more

Article
Martin Fowler: Event Sourcingmartinfowler.com
Talk
Greg Young: A decade of DDD, CQRS, Event SourcingGreg Young
Article
Versioning in an Event Sourced System (Greg Young)Greg Young
Docs
Microsoft: Event Sourcing patternMicrosoft Learn
Article
Designing Data-Intensive Applications, ch 11Martin Kleppmann

Deep dive15 min read← Back to crisp