GIN and GIST indexes
Internals of GIN inverted lists, GIST bounding boxes, BRIN as the third specialty index, operator classes, and how to pick the right one for each access pattern.
GIN and GIST are the second-tier indexes that make Postgres a serious database for anything beyond key-value lookups. PostGIS, pg_trgm, full-text search, range queries, JSONB, and array operators all depend on them.
GIN internals
GIN stands for Generalized Inverted Index. The structure is:
- An entry tree: a B-tree of unique items (words, jsonb paths, array elements)
- For each entry, a posting list: the sorted list of heap TIDs that contain that item
- If the posting list is large, it becomes a posting tree (another B-tree of TIDs)
To look up "rows containing 'postgres' AND 'index'":
- Look up 'postgres' in entry tree, get posting list of TIDs.
- Look up 'index' in entry tree, get posting list of TIDs.
- Intersect the two sorted TID lists.
- Visit those heap pages, recheck the actual condition.
For full-text search, the "items" are lexemes from to_tsvector. For JSONB with jsonb_path_ops, the items are hashes of paths-plus-values. For arrays, items are the array elements.
GIN fastupdate
GIN writes are expensive because inserting one row with 50 tags means updating 50 posting lists, each potentially in a different page. Postgres mitigates this with the "pending list":
- New entries go to a pending list (sequential append, cheap)
- Periodically, autovacuum or scans drain the pending list into the main GIN structure
- Queries scan both the main structure and the pending list
This is fastupdate = on (default). Trade-off: queries are slightly slower because they check the pending list. Bulk inserts are much faster.
For batch loads, drop the GIN index, load, recreate. Or set gin_pending_list_limit high.
GIN operator classes for JSONB
jsonb_ops(default): indexes every key and value. Supports@>,?,?|,?&. Larger index.jsonb_path_ops: indexes hashes of path-plus-value pairs. Only supports@>. Much smaller and faster.
Pick jsonb_path_ops unless you need key-existence operators.
CREATE INDEX idx_metadata ON events USING GIN (metadata jsonb_path_ops);
SELECT * FROM events WHERE metadata @> '{"type": "click"}';GIST internals
GIST is a generalized framework, not a specific data structure. You provide operations: union, penalty, picksplit, consistent. Postgres builds a balanced tree where each node represents the "union" of its children.
For 2D geometry, the union is a bounding box. For ranges, it is the range covering all children. For trigrams, a signature bitmap.
Searching:
- Start at root.
- For each child, check if its bounding region intersects the query.
- Descend into intersecting children.
- At leaves, check actual values.
GIST is lossy: the bounding box may contain rows that do not match. The index returns candidates, the executor rechecks.
Exclusion constraints
GIST's killer feature: enforce "no two rows have overlapping values." Used for scheduling, room booking, time-range non-overlap.
CREATE TABLE meetings (
id serial PRIMARY KEY,
room_id int,
during tstzrange,
EXCLUDE USING GIST (room_id WITH =, during WITH &&)
);The constraint reads: no two rows where room_id equals AND during overlaps. Enforced at insert time via the GIST index.
BRIN: the third specialty index
BRIN (Block Range Index) is the cheap option for very large, naturally ordered tables. It stores min/max per block range (default 128 pages). Lookups skip ranges that cannot contain the key.
CREATE INDEX idx_events_time ON events USING BRIN (created_at);Use cases:
- Append-only time-series: timestamps naturally clustered
- Log tables, metrics, IoT data
- 100GB table with no equality lookups, just range scans
Tiny index (megabytes for terabyte tables). Useless for random-access columns.
Choosing the right index
| Pattern | Index |
|---|---|
WHERE id = 5 | B-tree |
WHERE name LIKE 'yash%' | B-tree with text_pattern_ops |
WHERE name LIKE '%yash%' | GIN with pg_trgm |
WHERE doc @> '{"key": "val"}' | GIN with jsonb_path_ops |
WHERE tags @> ARRAY['x'] | GIN |
WHERE to_tsvector(body) @@ to_tsquery('postgres') | GIN on tsvector |
WHERE ST_Contains(region, point) | GIST (PostGIS) |
WHERE range && '[a, b]' | GIST |
WHERE created_at > now() - interval '1 day' on append-only big table | BRIN |
| Exclusion constraint (no overlap) | GIST |
Build performance
GIN builds are slow because they must process every item per row. For full-text on millions of articles, expect minutes to hours.
Tips:
- Set
maintenance_work_memhigh (8GB) during build - Use
CREATE INDEX CONCURRENTLYto avoid write lock - Drop and rebuild after bulk loads is often faster
GIST builds are roughly B-tree speed. PostGIS GIST on millions of geometries: minutes.
Read performance
Common pitfalls
- Indexing JSONB without
jsonb_path_ops. Default operator class is 2-3x larger than needed. - Forgetting
CONCURRENTLY. GIN builds can lock for hours. - Not using
gin_pending_list_limit. On heavy-write tables, the pending list grows and queries get slower. Tune to your workload. - Using B-tree for
LIKE '%x%'. It will not help. Use trigram GIN. - Exclusion constraints with high contention. Every insert checks the GIST tree. On hot rooms, becomes a serialization bottleneck.
The senior take
B-tree handles 90 percent of your indexes. GIN handles full-text, JSONB, and arrays. GIST handles geometry, ranges, and overlap constraints. BRIN handles giant append-only tables. Know the four and you can index anything Postgres can store. Most index design mistakes come from forcing B-tree onto a problem it cannot solve.
Learn more
- DocsPostgreSQL: GIN InternalsPostgreSQL
- DocsPostgreSQL: GIST InternalsPostgreSQL
- DocsPostGIS IndexingPostGIS