ACID properties

How Postgres actually implements ACID using the WAL, MVCC, and constraint checking, and where each guarantee leaks.

ACID is a contract, but it is also a set of implementation choices. Postgres makes specific ones. Understanding those choices tells you what your DB will and will not do under load.

Mental model: the WAL is the database

The dirty secret of Postgres is that the heap files (the actual table data on disk) are not the source of truth. The WAL (write-ahead log) is. Every change writes to the WAL first, fsyncs it, then the in-memory page is marked dirty. The background checkpointer later flushes dirty pages to the heap.

This is why durability works: even if the OS crashes between marking a page dirty and writing it to disk, on restart Postgres replays the WAL from the last checkpoint and reconstructs the lost pages.

Commit path: WAL fsync is the durability barrier

Atomicity: how rollback actually works

Postgres does not undo writes. There is no undo log like Oracle. Instead, every row version is tagged with the transaction ID (xmin, xmax) that created and deleted it. On ROLLBACK, Postgres just marks the transaction's xid as aborted in the pg_xact directory. Future readers check xmin against pg_xact, see "aborted," and skip the row.

This is why ROLLBACK is nearly instant in Postgres regardless of transaction size. It is also why long-running aborted transactions still leave dead tuples that VACUUM must reclaim.

Consistency: the weakest letter

ACID consistency is the property the database itself does the least for. The DB enforces:

Type constraints (cannot put a string in an int column)
NOT NULL
UNIQUE and PRIMARY KEY
FOREIGN KEY
CHECK constraints
Triggers you wrote

Everything else (your account balance never goes negative, your invoice total equals the sum of line items) is your job. Constraints are checked at end-of-statement by default, but DEFERRABLE constraints check at COMMIT. Use deferred constraints when two rows must reference each other (a chicken-and-egg insert problem).

Isolation: MVCC plus locks

Postgres provides four isolation levels but really implements three: Read Uncommitted is treated as Read Committed. Each level is a different rule for which row versions a snapshot can see.

Read Committed (default): each statement gets a fresh snapshot. You can see different data within one transaction across statements. Allows non-repeatable reads.
Repeatable Read: the snapshot is taken at the first statement and frozen for the transaction. No non-repeatable reads, but write skew is still possible.
Serializable: Repeatable Read plus Serializable Snapshot Isolation (SSI). Detects dangerous read-write patterns and aborts one transaction. Real serializability without the cost of strict 2-phase locking.

Durability: fsync and the WAL

When you call COMMIT, Postgres:

Writes a commit record to the WAL buffer
Flushes the WAL buffer to the OS page cache
Calls fsync on the WAL file up to the commit LSN
Returns success to the client

Step 3 is the expensive one. A single fsync on a spinning disk takes 5-10ms. On NVMe SSDs, around 50-200 microseconds. This is why throughput on small transactions is capped at maybe 5000 commits per second per connection without batching.

You can trade durability for throughput:

synchronous_commit = off: commit returns before fsync. Crash loses last few hundred ms of commits. Useful for non-critical writes.
fsync = off: never call fsync. Crash can corrupt the database. Never set this in production.
commit_delay: small delay to batch commits across backends. Group commit pattern.

Where ACID leaks

Logical replication is not synchronous by default. The primary commits, then ships WAL. Replica lag means the replica is briefly inconsistent with the primary. Reading from a replica in your transaction breaks isolation.
SERIALIZABLE on a hot row degrades to constant retries. SSI aborts on conflict. If your workload has a hot row (counter, queue head), expect retry storms. Use advisory locks or SELECT ... FOR UPDATE instead.
fsync lies on some filesystems. ext4 with data=writeback, NFS without sync option, consumer SSDs with poor power-loss protection. Test with pg_test_fsync.
Async commit + crash = silent data loss. The DB tells the client "committed" but the WAL was not flushed. Default is synchronous_commit = on for this reason.

Common pitfalls in production

Using Read Committed and expecting Serializable: write skew bugs. Two transactions read the same row, each decide it is safe to update, both commit, invariant broken.
Long-running transactions blocking VACUUM: any open transaction prevents VACUUM from cleaning dead tuples newer than its xmin. A 4-hour analytics query can bloat your tables by gigabytes.
Catching exceptions and continuing: in a transaction, once any statement errors, the whole transaction is poisoned. You must ROLLBACK or use SAVEPOINTs. Continuing to issue statements just gets current transaction is aborted errors.

The senior take

ACID is not a yes-or-no property. It is a set of dials. Postgres gives you strong defaults: Read Committed isolation, synchronous commit, fsync on. You can turn each dial down for throughput, but every notch costs correctness in a specific way. Know which knob you are turning and why.

Learn more

Docs
PostgreSQL: Reliability and the Write-Ahead LogPostgreSQL
Article
Designing Data-Intensive ApplicationsMartin Kleppmann
Paper
A Critique of ANSI SQL Isolation Levels (Berenson et al)Microsoft Research

Deep dive15 min read← Back to crisp