Deep dive15 min read← Back to crisp

GIN and GIST indexes

Internals of GIN inverted lists, GIST bounding boxes, BRIN as the third specialty index, operator classes, and how to pick the right one for each access pattern.

GIN and GIST are the second-tier indexes that make Postgres a serious database for anything beyond key-value lookups. PostGIS, pg_trgm, full-text search, range queries, JSONB, and array operators all depend on them.

GIN internals

GIN stands for Generalized Inverted Index. The structure is:

An entry tree: a B-tree of unique items (words, jsonb paths, array elements)
For each entry, a posting list: the sorted list of heap TIDs that contain that item
If the posting list is large, it becomes a posting tree (another B-tree of TIDs)

GIN structure: entry tree plus per-entry posting lists or trees

To look up "rows containing 'postgres' AND 'index'":

Look up 'postgres' in entry tree, get posting list of TIDs.
Look up 'index' in entry tree, get posting list of TIDs.
Intersect the two sorted TID lists.
Visit those heap pages, recheck the actual condition.

For full-text search, the "items" are lexemes from to_tsvector. For JSONB with jsonb_path_ops, the items are hashes of paths-plus-values. For arrays, items are the array elements.

GIN fastupdate

GIN writes are expensive because inserting one row with 50 tags means updating 50 posting lists, each potentially in a different page. Postgres mitigates this with the "pending list":

New entries go to a pending list (sequential append, cheap)
Periodically, autovacuum or scans drain the pending list into the main GIN structure
Queries scan both the main structure and the pending list

This is fastupdate = on (default). Trade-off: queries are slightly slower because they check the pending list. Bulk inserts are much faster.

For batch loads, drop the GIN index, load, recreate. Or set gin_pending_list_limit high.

GIN operator classes for JSONB

jsonb_ops (default): indexes every key and value. Supports @>, ?, ?|, ?&. Larger index.
jsonb_path_ops: indexes hashes of path-plus-value pairs. Only supports @>. Much smaller and faster.

Pick jsonb_path_ops unless you need key-existence operators.

CREATE INDEX idx_metadata ON events USING GIN (metadata jsonb_path_ops);
SELECT * FROM events WHERE metadata @> '{"type": "click"}';

GIST internals

GIST is a generalized framework, not a specific data structure. You provide operations: union, penalty, picksplit, consistent. Postgres builds a balanced tree where each node represents the "union" of its children.

For 2D geometry, the union is a bounding box. For ranges, it is the range covering all children. For trigrams, a signature bitmap.

Searching:

Start at root.
For each child, check if its bounding region intersects the query.
Descend into intersecting children.
At leaves, check actual values.

GIST is lossy: the bounding box may contain rows that do not match. The index returns candidates, the executor rechecks.

Exclusion constraints

GIST's killer feature: enforce "no two rows have overlapping values." Used for scheduling, room booking, time-range non-overlap.

CREATE TABLE meetings (
  id serial PRIMARY KEY,
  room_id int,
  during tstzrange,
  EXCLUDE USING GIST (room_id WITH =, during WITH &&)
);

The constraint reads: no two rows where room_id equals AND during overlaps. Enforced at insert time via the GIST index.

BRIN: the third specialty index

BRIN (Block Range Index) is the cheap option for very large, naturally ordered tables. It stores min/max per block range (default 128 pages). Lookups skip ranges that cannot contain the key.

CREATE INDEX idx_events_time ON events USING BRIN (created_at);

Use cases:

Append-only time-series: timestamps naturally clustered
Log tables, metrics, IoT data
100GB table with no equality lookups, just range scans

Tiny index (megabytes for terabyte tables). Useless for random-access columns.

Choosing the right index

Pattern	Index
`WHERE id = 5`	B-tree
`WHERE name LIKE 'yash%'`	B-tree with `text_pattern_ops`
`WHERE name LIKE '%yash%'`	GIN with `pg_trgm`
`WHERE doc @> '{"key": "val"}'`	GIN with `jsonb_path_ops`
`WHERE tags @> ARRAY['x']`	GIN
`WHERE to_tsvector(body) @@ to_tsquery('postgres')`	GIN on tsvector
`WHERE ST_Contains(region, point)`	GIST (PostGIS)
`WHERE range && '[a, b]'`	GIST
`WHERE created_at > now() - interval '1 day'` on append-only big table	BRIN
Exclusion constraint (no overlap)	GIST

Build performance

GIN builds are slow because they must process every item per row. For full-text on millions of articles, expect minutes to hours.

Tips:

Set maintenance_work_mem high (8GB) during build
Use CREATE INDEX CONCURRENTLY to avoid write lock
Drop and rebuild after bulk loads is often faster

GIST builds are roughly B-tree speed. PostGIS GIST on millions of geometries: minutes.

Read performance

Common pitfalls

Indexing JSONB without jsonb_path_ops. Default operator class is 2-3x larger than needed.
Forgetting CONCURRENTLY. GIN builds can lock for hours.
Not using gin_pending_list_limit. On heavy-write tables, the pending list grows and queries get slower. Tune to your workload.
Using B-tree for LIKE '%x%'. It will not help. Use trigram GIN.
Exclusion constraints with high contention. Every insert checks the GIST tree. On hot rooms, becomes a serialization bottleneck.

The senior take

B-tree handles 90 percent of your indexes. GIN handles full-text, JSONB, and arrays. GIST handles geometry, ranges, and overlap constraints. BRIN handles giant append-only tables. Know the four and you can index anything Postgres can store. Most index design mistakes come from forcing B-tree onto a problem it cannot solve.

Learn more

Docs
PostgreSQL: GIN InternalsPostgreSQL
Docs
PostgreSQL: GIST InternalsPostgreSQL
Docs
PostGIS IndexingPostGIS