Deep dive15 min read← Back to crisp

Fan-out architectures

Push vs pull vs hybrid, celebrity problem, inbox storage, ranking, pagination, and what Twitter, Instagram, and Facebook actually do.

The fan-out problem

One producer generates an item that must reach N consumers. The system question: when do you do the work?

At write time: pay once per post, multiply by N consumers. Reads are O(inbox-size) lookups.
At read time: cheap writes, expensive reads (must query every producer the consumer follows).

Both extremes break at scale. The art is the hybrid.

This pattern appears everywhere:

Twitter/X timelines.
Instagram feed.
Facebook News Feed.
LinkedIn feed.
Slack messages in channels.
Push notifications.
Email distribution lists.
IoT device commands.

The mechanics differ by scale and freshness requirements, but the design tradeoffs are the same.

Fan-out on write (push model)

When a producer creates content, immediately deliver to all consumers' personal inboxes.

User Alice (1000 followers) posts a tweet.
For each follower F:
  insert (tweet_id, alice_id, ts) into F.inbox

Pros:

Read is cheap: look at your inbox. O(inbox-size) for a paginated read.
Latency is bounded by inbox storage tech.
Easy to add personalization at write time.

Cons:

Write amplification: one post = N inbox writes.
Storage cost: inboxes grow large.
The celebrity problem: Justin Bieber posts, you write to 100M inboxes. Catastrophic.

Pre-2015 Twitter was almost pure fan-out on write. Inboxes lived in Redis. Worked great for normal users, broke for celebrities.

Fan-out on read (pull model)

When a consumer requests their feed, query every followee in real time, merge.

For each followee F in user.following:
  fetch latest posts from F
merge, rank, paginate, return

Pros:

Write is cheap: one insert per post.
No storage for inboxes.
Easy to back out (no inboxes to clean up).

Cons:

Read is expensive: O(follow-list) queries per feed load.
High tail latency: slowest followee dominates.
Hard to personalize without expensive joins.

Pure fan-out on read works at small scale (Mastodon-sized, not Twitter-sized).

The celebrity problem

Why fan-out on write breaks for celebrities:

One celebrity post = millions of inbox writes.
Spike load for hours.
Storage cost for stale data (most followers don't read every post).

Why fan-out on read breaks for active users:

Power user follows 5000 accounts.
Every feed load queries 5000 sources.
Slow.

The fix: hybrid. Make the strategy depend on follower count.

Hybrid fan-out

The Twitter and Instagram approach:

For each user, classify based on follower count:

Small accounts (<10k followers): fan-out on write. Cheap writes, predictable.
Large accounts (>10k followers): fan-out on read. Don't write to inboxes.

At read time:

Read inbox for posts from small accounts.
Pull recent posts from each large account the user follows.
Merge and rank.

This bounds write amplification (no 100M writes per post) and keeps reads fast for typical users (inbox is fast, celebrity pull is a small number of queries).

Threshold tuning depends on follower distribution. Twitter uses ~10k. Instagram likely similar.

Hybrid fan-out with celebrity threshold

Inbox storage

Inboxes need fast random access by user_id, time-ordered, paginated.

Options:

Redis sorted sets: O(log N) inserts, fast range reads. Memory-bound, capped per user (e.g., last 800 entries).
Cassandra: durable, scalable. Wide rows keyed by user_id, clustered by timestamp.
Manhattan (Twitter's KV store): purpose-built for timelines.
HBase: Facebook used it for Messenger inbox.

Capping is essential. Nobody reads past page 50. Trim aggressively.

Typical setup: hot inboxes in Redis (active users), cold inboxes computed on demand (inactive users). Saves 90%+ of memory cost.

Materialization tradeoffs

Materializing inboxes is denormalization for read speed. Tradeoffs:

Write amplification scales with avg follower count.
Inbox storage scales with users * inbox-cap.
Reads are O(1) lookups.

For Twitter scale (500M users, avg ~100 followers, inbox cap 800):

Inboxes are billions of rows, but capped.
Write QPS = post QPS * avg followers = significant but manageable with sharding.

For Slack: every message to a channel writes to every member's "channel inbox" (or just the channel itself, since Slack is channel-centric not user-centric). Different model.

Ranking

Modern feeds are not chronological. They're ranked by ML models that score (post, user) pairs.

Ranking pipeline:

Candidate generation: get ~1000 potentially relevant posts.
Filtering: remove already-seen, blocked content, etc.
Light ranker: cheap model scores all 1000, keep top 100.
Heavy ranker: expensive model (deep learning) scores top 100.
Diversity: re-rank to avoid showing 5 posts from same person.
Return top 20 for the page.

Candidate generation is where fan-out architecture matters. The 1000 candidates come from:

Your inbox (fan-out on write contribution).
Pulls from celebrities you follow.
Recommended posts from accounts you don't follow.
Ads.

The architecture decisions enable ranking. Without fast candidate generation, ranking is moot.

Pagination

Feed pagination is harder than it looks. Naive offset-based pagination (skip N, take 20) breaks:

New posts arrive, push existing posts down.
User sees duplicate or skipped content.

Cursor-based pagination uses a stable identifier:

First request: return 20 posts plus a cursor (e.g., timestamp + post_id).
Next request: "give me 20 posts before this cursor."
New posts can be loaded with a "give me posts after this cursor" request (pull-to-refresh).

Twitter uses tweet_id (which is roughly time-ordered Snowflake IDs). Instagram uses similar.

Real-time updates

Beyond initial load, users expect new posts to appear. Options:

Poll: client asks "anything new since cursor X" every 30s. Simple, expensive.
Long-polling: client opens connection, server holds until new content.
WebSocket: persistent bidirectional connection. Push when new content.
Server-Sent Events: one-way push.

Twitter web uses WebSockets for streaming. Slack uses WebSockets heavily. Mobile apps use platform-specific push (APNs, FCM) for background notifications.

The fan-out happens at the "deliver to active connection" layer too. When Alice posts, her followers' connected WebSockets receive a "new tweet" notification.

Special cases

Group messaging

Slack channel with 10,000 members. Sender writes once to channel. Each member's client reads from the channel. This is fan-out on read by default.

If you also need per-user unread counts, you need per-user state (a "read pointer" per user per channel). That's where it gets expensive.

Notification systems

Discord, Slack, Twitter all have notification systems separate from the feed. Push to APNs/FCM for mobile, server-sent events for web.

Fan-out happens at the notification preferences level: which notifications go to which channels (push, in-app, email), with batching and aggregation.

IoT command fan-out

Send a command to 10,000 devices. Use a publish-subscribe topic with broker (Kafka, MQTT). Devices subscribe; broker fans out efficiently.

Pitfalls

Hot inboxes: power user follows everyone, inbox writes overload one shard. Shard by user_id, ensure even distribution.
Stale inboxes: inactive users have huge inboxes nobody reads. Trim aggressively. Lazy-load on next login.
Backfill on new follow: when Bob follows Alice, do we backfill Bob's inbox with Alice's old posts? Twitter says no (just future posts). Some systems do partial backfill.
Unfollow cleanup: when Bob unfollows Alice, do we remove Alice's posts from Bob's inbox? Twitter does not (inbox is append-only). Filter at read time.
Privacy: when Alice deletes a post, removing from all inboxes is expensive. Soft-delete (mark and filter at read time).
Pagination with rewrites: ranking changes mid-scroll. Use stable cursors and tolerate slight inconsistency.

Patterns at scale

Twitter (now X):

Manhattan KV store for timelines.
Hybrid fan-out with celebrity threshold.
Active user inboxes in memory.
Inactive user inboxes computed on demand.
Stream-based ranking.

Instagram:

Cassandra for inbox storage.
ML-based ranking heavy.
Stories use different model (24-hour TTL, ranked by recency + engagement).

Facebook News Feed:

TAO (write-through cache) layered over MySQL.
Heavy denormalization.
ML ranking at multiple stages.

LinkedIn FollowFeed:

Mix of materialized and dynamic.
Posts stored once, fanned out lazily.
Heavy use of caching.

Designing your own

Decision tree:

How many users? <1M: fan-out on write everywhere, you'll be fine.
1M-100M: hybrid with celebrity threshold. Start at 10k follower threshold.
100M: hybrid + heavy caching + active/inactive distinction + ML candidate selection.
Real-time updates: use WebSockets for connected clients, push notifications for offline.
Ranking: chronological first, ML later. Don't pre-optimize.

Learn more

Talk
Twitter Timeline at scaleInfoQ
Paper
Facebook TAO paperUSENIX
Article
Instagram feed rankingMeta Engineering
Article
LinkedIn news feedLinkedIn Engineering
Article
Designing Data-Intensive Applications, Chapter 1Martin Kleppmann

Deep dive15 min read← Back to crisp

Fan-out architectures

Push vs pull vs hybrid, celebrity problem, inbox storage, ranking, pagination, and what Twitter, Instagram, and Facebook actually do.

The fan-out problem

One producer generates an item that must reach N consumers. The system question: when do you do the work?

At write time: pay once per post, multiply by N consumers. Reads are O(inbox-size) lookups.
At read time: cheap writes, expensive reads (must query every producer the consumer follows).

Both extremes break at scale. The art is the hybrid.

This pattern appears everywhere:

Twitter/X timelines.
Instagram feed.
Facebook News Feed.
LinkedIn feed.
Slack messages in channels.
Push notifications.
Email distribution lists.
IoT device commands.

The mechanics differ by scale and freshness requirements, but the design tradeoffs are the same.

Fan-out on write (push model)

When a producer creates content, immediately deliver to all consumers' personal inboxes.

User Alice (1000 followers) posts a tweet.
For each follower F:
  insert (tweet_id, alice_id, ts) into F.inbox

Pros:

Read is cheap: look at your inbox. O(inbox-size) for a paginated read.
Latency is bounded by inbox storage tech.
Easy to add personalization at write time.

Cons:

Write amplification: one post = N inbox writes.
Storage cost: inboxes grow large.
The celebrity problem: Justin Bieber posts, you write to 100M inboxes. Catastrophic.

Pre-2015 Twitter was almost pure fan-out on write. Inboxes lived in Redis. Worked great for normal users, broke for celebrities.

Fan-out on read (pull model)

When a consumer requests their feed, query every followee in real time, merge.

For each followee F in user.following:
  fetch latest posts from F
merge, rank, paginate, return

Pros:

Write is cheap: one insert per post.
No storage for inboxes.
Easy to back out (no inboxes to clean up).

Cons:

Read is expensive: O(follow-list) queries per feed load.
High tail latency: slowest followee dominates.
Hard to personalize without expensive joins.

Pure fan-out on read works at small scale (Mastodon-sized, not Twitter-sized).

The celebrity problem

Why fan-out on write breaks for celebrities:

One celebrity post = millions of inbox writes.
Spike load for hours.
Storage cost for stale data (most followers don't read every post).

Why fan-out on read breaks for active users:

Power user follows 5000 accounts.
Every feed load queries 5000 sources.
Slow.

The fix: hybrid. Make the strategy depend on follower count.

Hybrid fan-out

The Twitter and Instagram approach:

For each user, classify based on follower count:

Small accounts (<10k followers): fan-out on write. Cheap writes, predictable.
Large accounts (>10k followers): fan-out on read. Don't write to inboxes.

At read time:

Read inbox for posts from small accounts.
Pull recent posts from each large account the user follows.
Merge and rank.

This bounds write amplification (no 100M writes per post) and keeps reads fast for typical users (inbox is fast, celebrity pull is a small number of queries).

Threshold tuning depends on follower distribution. Twitter uses ~10k. Instagram likely similar.

Hybrid fan-out with celebrity threshold

Inbox storage

Inboxes need fast random access by user_id, time-ordered, paginated.

Options:

Redis sorted sets: O(log N) inserts, fast range reads. Memory-bound, capped per user (e.g., last 800 entries).
Cassandra: durable, scalable. Wide rows keyed by user_id, clustered by timestamp.
Manhattan (Twitter's KV store): purpose-built for timelines.
HBase: Facebook used it for Messenger inbox.

Capping is essential. Nobody reads past page 50. Trim aggressively.

Typical setup: hot inboxes in Redis (active users), cold inboxes computed on demand (inactive users). Saves 90%+ of memory cost.

Materialization tradeoffs

Materializing inboxes is denormalization for read speed. Tradeoffs:

Write amplification scales with avg follower count.
Inbox storage scales with users * inbox-cap.
Reads are O(1) lookups.

For Twitter scale (500M users, avg ~100 followers, inbox cap 800):

Inboxes are billions of rows, but capped.
Write QPS = post QPS * avg followers = significant but manageable with sharding.

For Slack: every message to a channel writes to every member's "channel inbox" (or just the channel itself, since Slack is channel-centric not user-centric). Different model.

Ranking

Modern feeds are not chronological. They're ranked by ML models that score (post, user) pairs.

Ranking pipeline:

Candidate generation: get ~1000 potentially relevant posts.
Filtering: remove already-seen, blocked content, etc.
Light ranker: cheap model scores all 1000, keep top 100.
Heavy ranker: expensive model (deep learning) scores top 100.
Diversity: re-rank to avoid showing 5 posts from same person.
Return top 20 for the page.

Candidate generation is where fan-out architecture matters. The 1000 candidates come from:

Your inbox (fan-out on write contribution).
Pulls from celebrities you follow.
Recommended posts from accounts you don't follow.
Ads.

The architecture decisions enable ranking. Without fast candidate generation, ranking is moot.

Pagination

Feed pagination is harder than it looks. Naive offset-based pagination (skip N, take 20) breaks:

New posts arrive, push existing posts down.
User sees duplicate or skipped content.

Cursor-based pagination uses a stable identifier:

First request: return 20 posts plus a cursor (e.g., timestamp + post_id).
Next request: "give me 20 posts before this cursor."
New posts can be loaded with a "give me posts after this cursor" request (pull-to-refresh).

Twitter uses tweet_id (which is roughly time-ordered Snowflake IDs). Instagram uses similar.

Real-time updates

Beyond initial load, users expect new posts to appear. Options:

Poll: client asks "anything new since cursor X" every 30s. Simple, expensive.
Long-polling: client opens connection, server holds until new content.
WebSocket: persistent bidirectional connection. Push when new content.
Server-Sent Events: one-way push.

Twitter web uses WebSockets for streaming. Slack uses WebSockets heavily. Mobile apps use platform-specific push (APNs, FCM) for background notifications.

The fan-out happens at the "deliver to active connection" layer too. When Alice posts, her followers' connected WebSockets receive a "new tweet" notification.

Hot inboxes: power user follows everyone, inbox writes overload one shard. Shard by user_id, ensure even distribution.
Stale inboxes: inactive users have huge inboxes nobody reads. Trim aggressively. Lazy-load on next login.
Backfill on new follow: when Bob follows Alice, do we backfill Bob's inbox with Alice's old posts? Twitter says no (just future posts). Some systems do partial backfill.
Unfollow cleanup: when Bob unfollows Alice, do we remove Alice's posts from Bob's inbox? Twitter does not (inbox is append-only). Filter at read time.
Privacy: when Alice deletes a post, removing from all inboxes is expensive. Soft-delete (mark and filter at read time).
Pagination with rewrites: ranking changes mid-scroll. Use stable cursors and tolerate slight inconsistency.

Patterns at scale

Twitter (now X):

Manhattan KV store for timelines.
Hybrid fan-out with celebrity threshold.
Active user inboxes in memory.
Inactive user inboxes computed on demand.
Stream-based ranking.

Instagram:

Cassandra for inbox storage.
ML-based ranking heavy.
Stories use different model (24-hour TTL, ranked by recency + engagement).

Facebook News Feed:

TAO (write-through cache) layered over MySQL.
Heavy denormalization.
ML ranking at multiple stages.

LinkedIn FollowFeed:

Mix of materialized and dynamic.
Posts stored once, fanned out lazily.
Heavy use of caching.

Designing your own

Decision tree:

How many users? <1M: fan-out on write everywhere, you'll be fine.
1M-100M: hybrid with celebrity threshold. Start at 10k follower threshold.
100M: hybrid + heavy caching + active/inactive distinction + ML candidate selection.
Real-time updates: use WebSockets for connected clients, push notifications for offline.
Ranking: chronological first, ML later. Don't pre-optimize.

Learn more

Talk
Twitter Timeline at scaleInfoQ
Paper
Facebook TAO paperUSENIX
Article
Instagram feed rankingMeta Engineering
Article
LinkedIn news feedLinkedIn Engineering
Article
Designing Data-Intensive Applications, Chapter 1Martin Kleppmann

Fan-out architectures

The fan-out problem

Fan-out on write (push model)

Fan-out on read (pull model)

The celebrity problem

Hybrid fan-out

Inbox storage

Materialization tradeoffs

Ranking

Real-time updates

Special cases

Group messaging

Notification systems

IoT command fan-out

Pitfalls

Patterns at scale

Designing your own

What to read next

Learn more

Fan-out architectures

The fan-out problem

Fan-out on write (push model)

Fan-out on read (pull model)

The celebrity problem

Hybrid fan-out

Inbox storage

Materialization tradeoffs

Ranking

Real-time updates

Special cases

Group messaging

Notification systems

IoT command fan-out

Pitfalls

Patterns at scale

Designing your own

What to read next

Learn more