ACID properties
How Postgres actually implements ACID using the WAL, MVCC, and constraint checking, and where each guarantee leaks.
ACID is a contract, but it is also a set of implementation choices. Postgres makes specific ones. Understanding those choices tells you what your DB will and will not do under load.
Mental model: the WAL is the database
The dirty secret of Postgres is that the heap files (the actual table data on disk) are not the source of truth. The WAL (write-ahead log) is. Every change writes to the WAL first, fsyncs it, then the in-memory page is marked dirty. The background checkpointer later flushes dirty pages to the heap.
This is why durability works: even if the OS crashes between marking a page dirty and writing it to disk, on restart Postgres replays the WAL from the last checkpoint and reconstructs the lost pages.
Atomicity: how rollback actually works
Postgres does not undo writes. There is no undo log like Oracle. Instead, every row version is tagged with the transaction ID (xmin, xmax) that created and deleted it. On ROLLBACK, Postgres just marks the transaction's xid as aborted in the pg_xact directory. Future readers check xmin against pg_xact, see "aborted," and skip the row.
This is why ROLLBACK is nearly instant in Postgres regardless of transaction size. It is also why long-running aborted transactions still leave dead tuples that VACUUM must reclaim.
Consistency: the weakest letter
ACID consistency is the property the database itself does the least for. The DB enforces:
- Type constraints (cannot put a string in an
intcolumn) NOT NULLUNIQUEandPRIMARY KEYFOREIGN KEYCHECKconstraints- Triggers you wrote
Everything else (your account balance never goes negative, your invoice total equals the sum of line items) is your job. Constraints are checked at end-of-statement by default, but DEFERRABLE constraints check at COMMIT. Use deferred constraints when two rows must reference each other (a chicken-and-egg insert problem).
Isolation: MVCC plus locks
Postgres provides four isolation levels but really implements three: Read Uncommitted is treated as Read Committed. Each level is a different rule for which row versions a snapshot can see.
- Read Committed (default): each statement gets a fresh snapshot. You can see different data within one transaction across statements. Allows non-repeatable reads.
- Repeatable Read: the snapshot is taken at the first statement and frozen for the transaction. No non-repeatable reads, but write skew is still possible.
- Serializable: Repeatable Read plus Serializable Snapshot Isolation (SSI). Detects dangerous read-write patterns and aborts one transaction. Real serializability without the cost of strict 2-phase locking.
Durability: fsync and the WAL
When you call COMMIT, Postgres:
- Writes a commit record to the WAL buffer
- Flushes the WAL buffer to the OS page cache
- Calls
fsyncon the WAL file up to the commit LSN - Returns success to the client
Step 3 is the expensive one. A single fsync on a spinning disk takes 5-10ms. On NVMe SSDs, around 50-200 microseconds. This is why throughput on small transactions is capped at maybe 5000 commits per second per connection without batching.
You can trade durability for throughput:
synchronous_commit = off: commit returns beforefsync. Crash loses last few hundred ms of commits. Useful for non-critical writes.fsync = off: never callfsync. Crash can corrupt the database. Never set this in production.commit_delay: small delay to batch commits across backends. Group commit pattern.
Where ACID leaks
- Logical replication is not synchronous by default. The primary commits, then ships WAL. Replica lag means the replica is briefly inconsistent with the primary. Reading from a replica in your transaction breaks isolation.
SERIALIZABLEon a hot row degrades to constant retries. SSI aborts on conflict. If your workload has a hot row (counter, queue head), expect retry storms. Use advisory locks orSELECT ... FOR UPDATEinstead.fsynclies on some filesystems. ext4 withdata=writeback, NFS withoutsyncoption, consumer SSDs with poor power-loss protection. Test withpg_test_fsync.- Async commit + crash = silent data loss. The DB tells the client "committed" but the WAL was not flushed. Default is
synchronous_commit = onfor this reason.
Common pitfalls in production
- Using Read Committed and expecting Serializable: write skew bugs. Two transactions read the same row, each decide it is safe to update, both commit, invariant broken.
- Long-running transactions blocking VACUUM: any open transaction prevents VACUUM from cleaning dead tuples newer than its
xmin. A 4-hour analytics query can bloat your tables by gigabytes. - Catching exceptions and continuing: in a transaction, once any statement errors, the whole transaction is poisoned. You must
ROLLBACKor useSAVEPOINTs. Continuing to issue statements just getscurrent transaction is abortederrors.
The senior take
ACID is not a yes-or-no property. It is a set of dials. Postgres gives you strong defaults: Read Committed isolation, synchronous commit, fsync on. You can turn each dial down for throughput, but every notch costs correctness in a specific way. Know which knob you are turning and why.
Learn more
- Docs
- ArticleDesigning Data-Intensive ApplicationsMartin Kleppmann
- PaperA Critique of ANSI SQL Isolation Levels (Berenson et al)Microsoft Research