Join strategies (nested, hash, merge)
Postgres picks one of three physical join algorithms per join: nested loop for small outer plus indexed inner, hash for medium unsorted sets, merge for large pre-sorted inputs.
Joining is asking "for each row in A, give me matching rows in B." There are three ways to answer.
Nested Loop
For each row in outer, look up matching rows in inner. O(n * m) naively. With an index on inner's join key, it becomes O(n * log m).
Best when:
- Outer has few rows (under ~1000)
- Inner has an index on the join key
- Selective predicates already filtered outer
Plan looks like:
Nested Loop
-> Seq Scan on small_table
-> Index Scan using idx_join_key on big_table
Hash Join
Build a hash table from the smaller side, probe with the larger side. O(n + m). Needs work_mem to fit the hash table; otherwise spills to disk in batches.
Best when:
- Both sides are unsorted
- Sizes are medium to large
- Equality join (
=only, not range)
Hash Join
Hash Cond: (a.id = b.a_id)
-> Seq Scan on big_a
-> Hash
-> Seq Scan on small_b
Merge Join
Sort both sides on the join key, then walk in parallel. O(n + m) plus sort cost. If inputs are already sorted (index scan in key order), no sort needed.
Best when:
- Both sides are large
- Both can be produced in sorted order cheaply (index on join key)
- Outer joins where streaming order matters
How the planner picks
| Outer size | Inner | Best join |
|---|---|---|
| Small | Index on join key | Nested Loop |
| Medium | Anything | Hash Join |
| Large | Already sorted | Merge Join |
| Any | Range condition | Nested Loop (range joins only) |
The planner estimates cost of each and picks the cheapest. Get estimates wrong and it picks poorly.
The interview line
"Postgres has three join algorithms. Nested loop is O(n log m) with an index on inner, best for small outer. Hash join is O(n+m), builds a hash on the smaller side, best for medium-to-large unsorted equi-joins. Merge join is O(n+m) on pre-sorted inputs, best when both sides are already in key order. The planner picks based on cost estimates from ANALYZE statistics."
Learn more
- DocsPostgreSQL: Join MethodsPostgreSQL
- ArticleUse the Index, Luke: JoinsMarkus Winand