In revision.
Crisp5 min readGo deeper →

MVCC in Postgres

MVCC means every UPDATE writes a new row version instead of overwriting, so readers never block writers and writers never block readers.

MVCC stands for Multi-Version Concurrency Control. The one-line version: Postgres never updates a row in place. Every UPDATE writes a new tuple and marks the old one dead. Every transaction sees a snapshot of the database as of its start, ignoring rows that do not belong to its snapshot.

How it works in 30 seconds

Each row has hidden columns xmin (the transaction that created it) and xmax (the transaction that deleted it, or 0). A reader at transaction T only sees rows where:

  • xmin is a committed transaction with id less than T's snapshot, and
  • xmax is either 0, aborted, or greater than T's snapshot

That is it. No read locks. No blocking. The cost is dead tuples that pile up until VACUUM reclaims them.

T1's snapshot ignores T2's update entirely

Why this is good

  • Readers never block writers. A long SELECT does not stop concurrent UPDATEs.
  • Writers never block readers. The opposite is also true.
  • Consistent snapshots are cheap. Backups, analytics queries, long reports.

Why this is annoying

  • Bloat. Every UPDATE creates a dead tuple. A heavily updated table without VACUUM becomes 10x its real size.
  • Index amplification. Every new tuple needs new index entries (unless HOT update applies). Indexes bloat too.
  • Transaction ID wraparound. xid is 32 bits. After 2 billion transactions, Postgres must aggressively VACUUM FREEZE or refuse new transactions. This has taken down production systems (Sentry, MailChimp post-mortems).

The interview line

"Postgres MVCC means an UPDATE is really an INSERT of a new row version plus marking the old version dead. Each transaction has a snapshot defined by its xmin and the list of in-progress xids, and only sees row versions visible to that snapshot. Readers and writers never block each other, but you pay in dead tuples that VACUUM has to clean up."

What VACUUM does

VACUUM walks tables, finds tuples whose xmax is committed and older than the oldest active snapshot, and marks their space reusable. VACUUM FULL rewrites the table to actually shrink it (and takes an ACCESS EXCLUSIVE lock). Autovacuum runs in the background based on dead-tuple thresholds.

Learn more