Fan-out architectures
Push vs pull vs hybrid, celebrity problem, inbox storage, ranking, pagination, and what Twitter, Instagram, and Facebook actually do.
The fan-out problem
One producer generates an item that must reach N consumers. The system question: when do you do the work?
- At write time: pay once per post, multiply by N consumers. Reads are O(inbox-size) lookups.
- At read time: cheap writes, expensive reads (must query every producer the consumer follows).
Both extremes break at scale. The art is the hybrid.
This pattern appears everywhere:
- Twitter/X timelines.
- Instagram feed.
- Facebook News Feed.
- LinkedIn feed.
- Slack messages in channels.
- Push notifications.
- Email distribution lists.
- IoT device commands.
The mechanics differ by scale and freshness requirements, but the design tradeoffs are the same.
Fan-out on write (push model)
When a producer creates content, immediately deliver to all consumers' personal inboxes.
User Alice (1000 followers) posts a tweet.
For each follower F:
insert (tweet_id, alice_id, ts) into F.inbox
Pros:
- Read is cheap: look at your inbox. O(inbox-size) for a paginated read.
- Latency is bounded by inbox storage tech.
- Easy to add personalization at write time.
Cons:
- Write amplification: one post = N inbox writes.
- Storage cost: inboxes grow large.
- The celebrity problem: Justin Bieber posts, you write to 100M inboxes. Catastrophic.
Pre-2015 Twitter was almost pure fan-out on write. Inboxes lived in Redis. Worked great for normal users, broke for celebrities.
Fan-out on read (pull model)
When a consumer requests their feed, query every followee in real time, merge.
For each followee F in user.following:
fetch latest posts from F
merge, rank, paginate, return
Pros:
- Write is cheap: one insert per post.
- No storage for inboxes.
- Easy to back out (no inboxes to clean up).
Cons:
- Read is expensive: O(follow-list) queries per feed load.
- High tail latency: slowest followee dominates.
- Hard to personalize without expensive joins.
Pure fan-out on read works at small scale (Mastodon-sized, not Twitter-sized).
The celebrity problem
Why fan-out on write breaks for celebrities:
- One celebrity post = millions of inbox writes.
- Spike load for hours.
- Storage cost for stale data (most followers don't read every post).
Why fan-out on read breaks for active users:
- Power user follows 5000 accounts.
- Every feed load queries 5000 sources.
- Slow.
The fix: hybrid. Make the strategy depend on follower count.
Hybrid fan-out
The Twitter and Instagram approach:
For each user, classify based on follower count:
- Small accounts (<10k followers): fan-out on write. Cheap writes, predictable.
- Large accounts (>10k followers): fan-out on read. Don't write to inboxes.
At read time:
- Read inbox for posts from small accounts.
- Pull recent posts from each large account the user follows.
- Merge and rank.
This bounds write amplification (no 100M writes per post) and keeps reads fast for typical users (inbox is fast, celebrity pull is a small number of queries).
Threshold tuning depends on follower distribution. Twitter uses ~10k. Instagram likely similar.
Inbox storage
Inboxes need fast random access by user_id, time-ordered, paginated.
Options:
- Redis sorted sets: O(log N) inserts, fast range reads. Memory-bound, capped per user (e.g., last 800 entries).
- Cassandra: durable, scalable. Wide rows keyed by user_id, clustered by timestamp.
- Manhattan (Twitter's KV store): purpose-built for timelines.
- HBase: Facebook used it for Messenger inbox.
Capping is essential. Nobody reads past page 50. Trim aggressively.
Typical setup: hot inboxes in Redis (active users), cold inboxes computed on demand (inactive users). Saves 90%+ of memory cost.
Materialization tradeoffs
Materializing inboxes is denormalization for read speed. Tradeoffs:
- Write amplification scales with avg follower count.
- Inbox storage scales with users * inbox-cap.
- Reads are O(1) lookups.
For Twitter scale (500M users, avg ~100 followers, inbox cap 800):
- Inboxes are billions of rows, but capped.
- Write QPS = post QPS * avg followers = significant but manageable with sharding.
For Slack: every message to a channel writes to every member's "channel inbox" (or just the channel itself, since Slack is channel-centric not user-centric). Different model.
Ranking
Modern feeds are not chronological. They're ranked by ML models that score (post, user) pairs.
Ranking pipeline:
- Candidate generation: get ~1000 potentially relevant posts.
- Filtering: remove already-seen, blocked content, etc.
- Light ranker: cheap model scores all 1000, keep top 100.
- Heavy ranker: expensive model (deep learning) scores top 100.
- Diversity: re-rank to avoid showing 5 posts from same person.
- Return top 20 for the page.
Candidate generation is where fan-out architecture matters. The 1000 candidates come from:
- Your inbox (fan-out on write contribution).
- Pulls from celebrities you follow.
- Recommended posts from accounts you don't follow.
- Ads.
The architecture decisions enable ranking. Without fast candidate generation, ranking is moot.
Pagination
Feed pagination is harder than it looks. Naive offset-based pagination (skip N, take 20) breaks:
- New posts arrive, push existing posts down.
- User sees duplicate or skipped content.
Cursor-based pagination uses a stable identifier:
- First request: return 20 posts plus a cursor (e.g., timestamp + post_id).
- Next request: "give me 20 posts before this cursor."
- New posts can be loaded with a "give me posts after this cursor" request (pull-to-refresh).
Twitter uses tweet_id (which is roughly time-ordered Snowflake IDs). Instagram uses similar.
Real-time updates
Beyond initial load, users expect new posts to appear. Options:
- Poll: client asks "anything new since cursor X" every 30s. Simple, expensive.
- Long-polling: client opens connection, server holds until new content.
- WebSocket: persistent bidirectional connection. Push when new content.
- Server-Sent Events: one-way push.
Twitter web uses WebSockets for streaming. Slack uses WebSockets heavily. Mobile apps use platform-specific push (APNs, FCM) for background notifications.
The fan-out happens at the "deliver to active connection" layer too. When Alice posts, her followers' connected WebSockets receive a "new tweet" notification.
Special cases
Group messaging
Slack channel with 10,000 members. Sender writes once to channel. Each member's client reads from the channel. This is fan-out on read by default.
If you also need per-user unread counts, you need per-user state (a "read pointer" per user per channel). That's where it gets expensive.
Notification systems
Discord, Slack, Twitter all have notification systems separate from the feed. Push to APNs/FCM for mobile, server-sent events for web.
Fan-out happens at the notification preferences level: which notifications go to which channels (push, in-app, email), with batching and aggregation.
IoT command fan-out
Send a command to 10,000 devices. Use a publish-subscribe topic with broker (Kafka, MQTT). Devices subscribe; broker fans out efficiently.
Pitfalls
-
Hot inboxes: power user follows everyone, inbox writes overload one shard. Shard by user_id, ensure even distribution.
-
Stale inboxes: inactive users have huge inboxes nobody reads. Trim aggressively. Lazy-load on next login.
-
Backfill on new follow: when Bob follows Alice, do we backfill Bob's inbox with Alice's old posts? Twitter says no (just future posts). Some systems do partial backfill.
-
Unfollow cleanup: when Bob unfollows Alice, do we remove Alice's posts from Bob's inbox? Twitter does not (inbox is append-only). Filter at read time.
-
Privacy: when Alice deletes a post, removing from all inboxes is expensive. Soft-delete (mark and filter at read time).
-
Pagination with rewrites: ranking changes mid-scroll. Use stable cursors and tolerate slight inconsistency.
Patterns at scale
Twitter (now X):
- Manhattan KV store for timelines.
- Hybrid fan-out with celebrity threshold.
- Active user inboxes in memory.
- Inactive user inboxes computed on demand.
- Stream-based ranking.
Instagram:
- Cassandra for inbox storage.
- ML-based ranking heavy.
- Stories use different model (24-hour TTL, ranked by recency + engagement).
Facebook News Feed:
- TAO (write-through cache) layered over MySQL.
- Heavy denormalization.
- ML ranking at multiple stages.
LinkedIn FollowFeed:
- Mix of materialized and dynamic.
- Posts stored once, fanned out lazily.
- Heavy use of caching.
Designing your own
Decision tree:
- How many users? <1M: fan-out on write everywhere, you'll be fine.
- 1M-100M: hybrid with celebrity threshold. Start at 10k follower threshold.
-
100M: hybrid + heavy caching + active/inactive distinction + ML candidate selection.
- Real-time updates: use WebSockets for connected clients, push notifications for offline.
- Ranking: chronological first, ML later. Don't pre-optimize.
What to read next
- Twitter Timeline talks for the canonical hybrid architecture.
- Facebook TAO paper for the graph-store-over-MySQL pattern.
- Instagram engineering blog for ML feed ranking.
- DDIA chapter 1 introduces the fan-out tradeoff using Twitter as the example.
Learn more
- Talk
- PaperFacebook TAO paperUSENIX
- ArticleInstagram feed rankingMeta Engineering
- ArticleLinkedIn news feedLinkedIn Engineering
- ArticleDesigning Data-Intensive Applications, Chapter 1Martin Kleppmann