Deep dive15 min read← Back to crisp

Query planner and EXPLAIN

How the Postgres cost-based optimizer enumerates plans, estimates selectivity, and where it goes wrong. Plus how to read EXPLAIN like a senior engineer.

The Postgres planner is a sophisticated cost-based optimizer that turns declarative SQL into procedural execution plans. Understanding how it makes decisions, and how it can be wrong, is the difference between writing queries and tuning them.

The optimization pipeline

SQL → Parse Tree → Query Tree (post-rewrite) → Plan Tree → Execution

The planner stage is where the magic happens:

Path generation: for each scan, generate possible access paths (seq scan, index scan, bitmap scan)
Join ordering: enumerate join orders (dynamic programming up to ~10 tables, genetic algorithm beyond)
Join method selection: for each pair, consider nested loop, hash join, merge join
Cost estimation: assign cost to each candidate
Plan selection: pick the lowest-cost complete plan

For 10 tables, there are about 10 million possible join orders. The planner uses dynamic programming to keep it tractable.

The cost model

Costs are in arbitrary units, calibrated so sequential page reads cost 1.0. Configurable parameters:

seq_page_cost = 1.0: cost to read a page sequentially
random_page_cost = 4.0: cost to read a page randomly (default; lower for SSD: 1.1)
cpu_tuple_cost = 0.01: cost to process one row
cpu_index_tuple_cost = 0.005: cost per index entry
cpu_operator_cost = 0.0025: cost per operator (function call)

For SSD, set random_page_cost = 1.1. Default of 4.0 is for spinning rust and biases the planner against index scans.

Statistics

The planner needs to estimate "how many rows will this WHERE clause return?" It uses statistics collected by ANALYZE:

Most Common Values (MCVs): top N values and their frequencies
Histogram: equi-depth buckets of remaining values
n_distinct: estimated number of distinct values
null_frac: fraction of nulls
Correlation: how sorted the column is vs physical order (affects index scan cost)

Per-column. Increase resolution with ALTER TABLE foo ALTER COLUMN bar SET STATISTICS 1000; then ANALYZE.

Selectivity estimation flow

Extended statistics

Default stats assume independence between columns. If columns are correlated (city and zip_code), estimates are wrong.

CREATE STATISTICS s_address ON city, zip_code FROM addresses;
ANALYZE addresses;

This collects:

ndistinct: distinct count across the column group
dependencies: functional dependencies (zip determines city)
mcv: multi-column MCVs

Massive improvement on correlated filters.

EXPLAIN output, decoded

Hash Join  (cost=14.50..28.50 rows=100 width=64) (actual time=0.234..1.456 rows=98 loops=1)
  Hash Cond: (o.user_id = u.id)
  Buffers: shared hit=42 read=8
  ->  Seq Scan on orders o  (cost=0.00..10.00 rows=1000 width=32) (actual rows=1000 loops=1)
        Buffers: shared hit=10
  ->  Hash  (cost=2.00..2.00 rows=10 width=32) (actual rows=10 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on users u  (cost=0.00..2.00 rows=10 width=32) (actual rows=10 loops=1)

Reading this:

Bottom-up execution: scan users, build hash table, scan orders, probe hash, output joined rows
Hash join cost 14.50..28.50: planner estimate
actual time 0.234..1.456: real time (startup..total per loop)
rows=100 estimated vs 98 actual: good estimate
Buffers shared hit=42 read=8: 42 pages from cache, 8 from disk

Signals of trouble

Row estimate off by 10x or more: stats are wrong. Run ANALYZE. Consider extended stats.
Rows Removed by Filter: index returned tuples that did not match the full predicate. Add the predicate to the index or use a partial index.
Heap Fetches: with index only scan, count of times it had to fetch the heap because the page was not all-visible. Run VACUUM.
Nested Loop with high outer count: O(n*m). If outer is 10,000 and inner is index scan, can still be fast (10,000 lookups). If inner is seq scan, disaster.
Sort Method: external merge Disk: 50000kB: sort spilled. Increase work_mem.
Batches: 8 in a Hash node: hash table did not fit in memory. Increase work_mem.

When the planner is wrong

Three common scenarios:

Correlated columns

WHERE city = 'Bangalore' AND state = 'Karnataka' estimates as freq(city) * freq(state). But every Bangalore row has state Karnataka. Estimate is 10-100x too low. Fix: extended statistics.

Function on indexed column

WHERE date_trunc('day', created_at) = '2026-06-21' cannot use a B-tree on created_at. Either rewrite as WHERE created_at >= '2026-06-21' AND created_at < '2026-06-22' or create an expression index.

Skewed data

99 percent of rows have status = 'closed', 1 percent are open. Planner sees average selectivity. Use MCVs (default 100, increase if needed).

Plan cache and parameter sniffing

Prepared statements are planned once and reused. Postgres tries to detect when re-planning is needed (generic vs custom plan, decided by plan_cache_mode). If your queries have wildly different parameter selectivities, force custom plans:

SET plan_cache_mode = force_custom_plan;

Hints, or the lack thereof

For actual hints, install the pg_hint_plan extension. Use sparingly.

Reading EXPLAIN like a senior

Look at total execution time at the bottom
Find the slowest node (often a Sort, Hash, or large Seq Scan)
Check estimated vs actual rows at that node
Check Buffers: how much disk I/O
If estimate is wrong, fix stats. If estimate is right, fix the plan (index, rewrite).

Common pitfalls

EXPLAIN without ANALYZE: just estimates, can be deeply wrong
ANALYZE on a transaction that modifies data: it runs the query for real
Not running ANALYZE after bulk loads: stats are stale, planner uses pre-load row counts
Trusting plan from dev DB: prod has different data distribution, plan will differ

The senior take

The planner is a cost model fed by statistics. Garbage stats give garbage plans. Run ANALYZE after bulk changes, set up extended statistics for correlated columns, set random_page_cost to match your storage, and read EXPLAIN from the bottom up. Most "Postgres is slow" tickets are missing indexes or stale stats, not planner bugs.

Learn more

Docs
PostgreSQL: The Planner OptimizerPostgreSQL
Docs
PostgreSQL: Statistics Used by the PlannerPostgreSQL
Talk
Bruce Momjian: Explaining the Postgres Query OptimizerBruce Momjian