Sharding and partitioning
Split data across nodes by key. Hash for uniformity, range for scans. The shard key choice is permanent and load-bearing.
Sharding splits one logical dataset across multiple physical nodes by a shard key. You shard when one node cannot hold the data, can't serve the write load, or can't be scaled vertically cheaper than horizontally.
Two ways to split
- Hash partitioning: hash(key) mod N. Uniform distribution. Range scans require fan-out.
- Range partitioning: shard 1 holds keys A-F, shard 2 holds G-M. Range scans are local. Hot ranges are a risk.
Hash is the default. Range is for time-series and ordered scans.
Consistent hashing
Naive hash mod N: when N changes (add/remove a node), every key gets remapped. Catastrophic.
Consistent hashing: map nodes and keys to points on a ring. Each key is owned by the next node clockwise. Adding a node only moves keys from one neighbor. ~1/N of keys remapped, not all.
Used by: DynamoDB, Cassandra, Memcached, Redis Cluster (variant).
Picking the shard key
This is the single most consequential design decision in the system. Once you ship, you cannot easily change it. Stripe took 4 years to migrate one shard key.
Rules:
- Cardinality: must have many distinct values. user_id good, country_code bad.
- Distribution: must be uniform. created_at is biased toward recent.
- Query alignment: queries must filter by shard key. Otherwise every query scatter-gathers.
- Tenant boundary: for multi-tenant SaaS, shard by tenant_id so one customer's data is on one shard.
Twitter learned this with celebrities. They sharded tweets by user_id. Then Justin Bieber happened. One user, one shard, billions of read requests. They had to special-case.
Resharding is painful
When a shard fills up or gets hot, you must split it. Options:
- Pre-split: hash space is 4096 virtual shards mapped to N physical. Add a physical node, remap virtual shards. Cassandra and DynamoDB do this.
- Online resharding: copy data while serving writes, switch over atomically. Vitess, MongoDB.
- Cold migration: write a script, take downtime. The fallback.
Plan for resharding before you ship. Use vshards from day 1.
Learn more
- ArticleDesigning Data-Intensive Applications, Chapter 6Martin Kleppmann
- DocsVitess sharding docsVitess
- Article