The Problem That Looks Simple But Isn't
You open Twitter (now X). A list of tweets appears. They are all from people you follow. The newest are at the top. The page loads in a couple of hundred milliseconds. You scroll. More tweets appear. Behind that smooth experience is one of the most studied system design problems in the industry.
The reason it is so studied is that the surface looks trivial. "Just query a database for the most recent tweets from people I follow, sort by time, and return them." That sentence describes the right product but the wrong system. At Twitter's scale, that approach falls apart immediately.
This article walks through how to actually build it. We will start from the naive approach, see exactly why it breaks, and arrive at the architecture that real production systems use.
Step 1: Gather Requirements
Functional Requirements
Non-Functional Requirements
Latency: the home timeline must load in under 300ms p99. Anything slower feels broken.
Read-heavy: reads vastly outnumber writes. Most users read far more tweets than they post.
High availability: 99.99% uptime. Twitter being down is news.
Scalable: 500M+ daily active users. Hundreds of millions of new tweets per day. Hundreds of billions of timeline reads per day.
Eventually consistent: a 1-2 second delay between posting and followers seeing it is fine.
Step 2: Capacity Estimation
Numbers shape architecture. Let us work them out.
Two takeaways from the numbers:
First, reads dominate writes by a hundredfold. The architecture must optimize for reads. Every read should be cheap.
Second, the fan-out from a single tweet is huge. If an average user has 200 followers, every tweet potentially needs to update 200 timelines. For celebrities (millions of followers), one tweet can touch tens of millions of timelines. This single fact shapes the entire design.
Step 3: The Naive Approach (And Why It Fails)
Here is the obvious design:
1. Store every tweet in a database with a timestamp.
2. Store the follow relationships in another table.
3. When a user opens their timeline, query: "give me the most recent N tweets from any user X where current_user follows X."
4. Sort by time. Return.
SQL roughly looks like:
SELECT t.* FROM tweets t
JOIN follows f ON f.followee_id = t.user_id
WHERE f.follower_id = 42
ORDER BY t.created_at DESC
LIMIT 50;
This works for 100 users. It dies at 100 million.
Why? Each timeline read does a join across two huge tables, sorts the result, and returns 50 rows. With 290,000 reads per second and a 200-follow average, that is roughly 58 million tweet rows being touched per second, just to render timelines. No database can sustain that.
The fundamental problem: doing the work at read time scales with reads. Reads outnumber writes 125:1. So we are paying the expensive cost on the dominant operation. We need to flip it.
Step 4: The Big Trade-off — Push vs Pull
There are two opposing strategies for delivering a timeline.
Pull (Compute on Read)
This is the naive approach above. Don't pre-compute anything. When a read comes in, gather the data and assemble the timeline live.
Pros: writes are cheap. A tweet is one row insert. Storage is small.
Cons: reads are expensive. Pays the full cost on every timeline view.
Push (Fan-Out on Write)
The opposite. When a user posts a tweet, immediately push the tweet ID into a pre-computed timeline cache for every one of their followers. Each user's timeline is just a sorted list of tweet IDs sitting in fast storage.
Reading the timeline becomes: "load this user's pre-computed list from cache, hydrate the tweet IDs into full tweet objects, return." One cache hit. Microsecond reads.
Pros: reads are dirt cheap.
Cons: writes are expensive when fan-out is high. A celebrity with 100M followers triggers 100M cache writes for one tweet. That is genuinely a lot of work.
Comparison
| Pull (compute on read) | Push (fan-out on write) | |
|---|---|---|
| Write cost | Cheap (1 row) | Expensive (N follower writes) |
| Read cost | Expensive (gather + sort) | Cheap (1 cache lookup) |
| Storage | Small | Large (N copies of tweet IDs) |
| Freshness | Real-time always | Lag while fan-out propagates |
| Scales with | Read traffic (bad) | Follower count (manageable except celebs) |
For a normal user with 1,000 followers, push is clearly better. One tweet costs 1,000 cache writes once, but every one of their followers reads cheaply for as long as the tweet is in their timeline.
For a celebrity with 100 million followers, push is a disaster. One tweet would mean 100 million cache writes. And celebrities tweet many times a day. The system would melt.
Step 5: The Hybrid Approach (What Twitter Actually Does)
Use push for normal users. Use pull for celebrities. Combine on read.
The trick: classify users into two groups based on follower count. The threshold is somewhere around 1 million followers.
Normal user posts: fan out the tweet to every follower's pre-computed timeline cache. Followers see it instantly when they refresh.
Celebrity user posts: do NOT fan out. Just write the tweet to their own user_timeline (which is queried at read time). Skip the cost of millions of cache writes.
Reading a timeline: load the user's pre-computed timeline (containing tweets from non-celebs they follow). At read time, also query each celebrity they follow for their recent tweets. Merge both lists, sort by time, return.
fan out to follower timelines
skip fan-out
tweets from non-celebs
recent tweets
The hybrid works because the cost is paid by whichever path is cheaper for that user. Most users have few followers, so fan-out is fine. The few users with massive followings (which would explode fan-out) are handled by pulling at read time. The merge cost on read is bounded because a typical user follows only a handful of celebrities, not hundreds.
Step 6: The Full Architecture
Here is how the pieces fit together.
POST tweet
GET home/user timeline
follows
fan-out jobs
thousands of consumers
Redis sorted sets per user
sharded by user_id
follow edges
What Each Piece Does
Tweet Service handles the write path. When a user posts, it: validates the content, writes the tweet to Tweet DB (sharded by user_id), and emits a fan-out job to Kafka.
Fan-out Workers consume the Kafka job. They look up the author's followers from the Social Graph Service. If the author is a celebrity, they skip fan-out. Otherwise, they push the tweet ID into each follower's Redis timeline.
Timeline Service handles reads. Given a user_id, it loads the pre-computed timeline from Redis and merges in any recent tweets from celebrities the user follows. Then it hydrates tweet IDs into full tweet objects (text, media URLs, author info) by querying the Tweet DB.
Social Graph Service answers "who does X follow?" and "who follows X?" Sharded by user_id with caching for active users.
Tweet DB stores the actual tweets. Could be MySQL, Cassandra, or a custom solution. Sharded by user_id so queries for "user X's recent tweets" are local to one shard.
Timeline Cache is Redis. Each user has a sorted set keyed by user_id, containing tweet IDs scored by timestamp. Capped at the last 800 tweets or so, because old timelines are rarely scrolled.
Step 7: Storage Design
Tweet Storage
Each tweet has fields like:
tweet_id, author_id, text, created_at, media_urls, reply_to_id, retweet_of_id, like_count, retweet_count
The schema is simple. The hard part is sharding. We shard by author_id so all of a user's tweets live on the same shard. This makes "show me user X's tweets" a single-shard query (efficient).
The trade-off: a query like "find all tweets that contain hashtag #foo" must touch every shard. That is fine for an analytics or search workload, which uses a separate index (Elasticsearch). The main tweet table is optimized for the dominant access patterns.
Timeline Cache (Redis)
Each user has a Redis key like timeline:42. The value is a sorted set: tweet IDs scored by timestamp. Reads are a single ZREVRANGE call: "give me the top 50 tweet IDs sorted by score descending." Microseconds.
To prevent the cache from growing unbounded, cap the size. If a user has more than 800 tweets in their timeline, trim the oldest. Most users never scroll back further. Those who do can query the database directly (slower, acceptable tail).
Social Graph (Follows)
Stored as edges: (follower_id, followee_id, created_at). Two indexes are needed:
By follower: "who does user X follow?" Used at read time when celebs need to be merged in.
By followee: "who follows user X?" Used at write time for fan-out.
Both are sharded by their respective ID. For very high-follower users, the follower list lives across many shards (each holding a subset).
Tweet Content (Hydration)
Timeline Cache stores only tweet IDs. To render a timeline, you must "hydrate" those IDs into full tweet objects (text, media, author display name, like counts). This is a batch lookup against an aggressively cached layer (often Memcached) with the database as fallback.
Step 8: Fan-Out at Scale
Fan-out workers do most of the heavy lifting. When user @alice (with 50,000 followers) tweets:
1. The Tweet Service emits a Kafka message: {tweet_id: 12345, author_id: alice_id}.
2. A fan-out worker picks it up.
3. The worker calls the Social Graph: "give me alice's followers."
4. The worker iterates over those 50,000 followers and inserts tweet_id: 12345 into each follower's Redis timeline (with timestamp as score).
5. If at any point the follower count exceeds the celebrity threshold, the worker stops and marks the author for pull-based handling on read.
To handle 12,000 tweets per second peak with average 200 followers, that is 2.4 million Redis writes per second from fan-out alone. This is parallelized across hundreds or thousands of workers, all consuming from the same Kafka topic.
Why Kafka and Not Direct Calls?
Decoupling. The Tweet Service finishes its work in milliseconds (just one DB write and one Kafka publish). Fan-out happens asynchronously. If the fan-out workers fall behind, tweets still post. They just take a bit longer to appear in followers' timelines.
Kafka also gives us replay. If a bug corrupts some timelines, we can re-process the tweet stream from a checkpoint to rebuild.
Step 9: Edge Cases and Tricky Problems
Deleted Tweets
When a tweet is deleted, the tweet_id remains in many timeline caches. Two options:
Lazy deletion: at hydration time, skip tweet IDs that no longer exist. Cheap, but the timeline might show a fewer-than-50 tweets briefly.
Active cleanup: emit a delete event that fan-out workers process by removing the tweet ID from follower timelines. More work but cleaner.
Most production systems do lazy deletion. Cleanup is too expensive for tweets with millions of fan-outs.
Unfollow
When user A unfollows user B, B's tweets remain in A's timeline cache. They will age out naturally. Same lazy approach.
New Followers See No History
If user A follows user B for the first time, A's timeline cache contains nothing from B. To fix this, the follow service can backfill: copy B's recent tweets into A's timeline. This is a small, bounded operation per follow event.
Timeline of a Brand-New User
A user who just signed up has no follows and no timeline. The system shows recommended popular tweets or onboarding suggestions. This is a separate "discover" service, not the home timeline path.
The Reply and Retweet Problem
Replies should appear in the timeline only for users who follow both the replier AND the parent author (otherwise the conversation is one-sided). This is a filter on top of the basic fan-out logic. It complicates fan-out but is essential for sensible UX.
Real-Time Tweet Push
When a user has the timeline open and a new tweet arrives, the system pushes a notification via WebSocket. The client either updates the timeline immediately or shows a "5 new tweets" banner.
Step 10: From Reverse-Chronological to Ranked
Modern Twitter does not show tweets in pure time order. It uses an ML model that scores tweets based on predicted engagement (will the user like, click, dwell, reply?). The top-scored tweets bubble up.
This adds a step on the read path:
1. Fetch candidate tweets (pre-computed timeline + celeb merge). Maybe 200 tweets total.
2. For each candidate, compute features: tweet age, author relationship, engagement signals, content type.
3. A ranking model scores each candidate.
4. Sort by score, return top 50.
The ranking model lives behind a feature store and a model-serving infrastructure. It runs on every timeline load, with a strict latency budget (50ms or so).
This article focused on the reverse-chronological case because it is the foundation. Ranking sits on top of it. See the News Feed article for a deeper dive into ranking pipelines.
Step 11: Recap of Key Decisions
The design hinges on a few non-obvious choices:
Hybrid fan-out instead of pure push or pure pull. Push for normal users, pull for celebrities. Best of both worlds.
Sharding by user_id in both Tweet DB and Social Graph. Aligns with the dominant access patterns.
Asynchronous fan-out via Kafka. Decouples write latency from fan-out work.
Sorted sets in Redis for timelines. O(log N) inserts, O(log N + M) range reads. Bounded size per user.
Lazy deletion over active cleanup. Trades a tiny visual quirk for major write savings.
Hydration as a separate step. Timeline storage is just IDs; full tweet objects come from a heavily cached lookup.
The One Thing to Remember
The Twitter timeline problem is a story about the tension between read cost and write cost. Pure push wastes resources on celebrities (where one write means tens of millions of cache updates). Pure pull wastes resources on every reader (where every timeline view does an expensive gather-and-sort). The hybrid approach uses each strategy where it shines: cheap fan-out for normal users, on-read merge for celebrities. The same lesson applies to most large-scale feed systems: detect the rare extreme cases and handle them differently from the common case. Almost every "this looks simple, surely we just write a query" system at scale ends up looking like Twitter's timeline.