Designing a News Feed - Mohammadali Bazyar

What's Different from Twitter Timeline

The Twitter timeline article covered a system where tweets appear in reverse-chronological order: newest first. That is the simpler problem. A modern news feed (Facebook, Instagram, LinkedIn, TikTok For You, Twitter's "For You" tab) does something fundamentally harder: it ranks every candidate post against every user using machine learning, and shows them what an ML model predicts they will engage with most.

This sounds incremental. It is not. Ranking turns "fetch and sort by time" into a multi-stage pipeline that scores hundreds of candidates per request, looks up thousands of features, runs a neural network, and returns the top 50. All in under 500ms. Across billions of users.

This article walks through how to build it. It assumes some familiarity with the basic timeline problem (push vs pull, fan-out, hybrid approaches). The content layer here is the same. The ranking layer on top is what makes a news feed.

Step 1: Requirements

Functional Requirements

Personalized Feed

Show each user a feed of posts ranked by what they are most likely to engage with.

Multiple Sources

Posts from friends, followed pages, groups, recommendations, ads. All blended into one feed.

Engagement Tracking

Every interaction (like, click, share, dwell time) feeds back into the ranking model.

Diversity

Avoid showing 10 posts from the same author. Mix content types. Surface new authors.

Real-Time Updates

A new post from a friend should appear within seconds, not hours.

Cold Start

New users with no engagement history still get a useful feed.

Non-Functional Requirements

Latency: total feed load under 500ms p99. The ranking model itself has ~150ms of that budget.
Scale: billions of users, billions of posts daily, trillions of feed reads per day.
Freshness: new posts available in feeds within seconds.
ML serving: the ranking model is called billions of times per day. Serving infrastructure must handle this throughput at sub-200ms latency.

Step 2: Capacity Estimation

Metric

Calculation

Result

Daily active users

given

~2 billion

Feed loads per user/day

given (avg)

~30

Total feed loads/day

2B × 30

~60 billion

Average feed QPS

60B / 86400

~700,000/sec

Candidates per feed load

post pool

~5,000

Model inferences/sec

700K × 5,000 candidates

~3.5 billion/sec

3.5 billion model inferences per second. That single number drives everything about the architecture. You cannot run a giant model 3.5 billion times per second affordably. So the system uses a multi-stage funnel: cheap stages that quickly narrow billions of posts to a handful of candidates, then progressively more expensive stages on the smaller set.

Step 3: The Three-Stage Pipeline

Modern news feeds are built as a funnel. Each stage takes input from the previous, applies progressively more complex logic, and outputs a smaller, better set.

Ranking Funnel

Universe

All posts
billions

narrow to ~5000

1. Recall

Candidate Generation
cheap, broad

narrow to ~200

2. Precision

Ranker
expensive ML, scores all candidates

narrow to ~50

3. Diversity

Re-ranker
diversify, dedupe, mix ads

return

User

Final feed

Step 4: Stage 1 — Candidate Generation

Out of billions of possible posts, we need a few thousand worth scoring. Candidate sources include:

Network sources: posts from people you follow, recent posts in groups you joined, comments on threads you participate in.
Engagement sources: posts liked by your friends ("X liked this"), posts trending in your network.
Topical sources: posts about topics you have engaged with (cooking, hiking, AI).
Discovery sources: ML-recommended candidates from a separate retrieval model. "Users like you also liked..."
Sponsored: ads selected by the ad system, ranked separately and inserted.
Trending: posts going viral in real time.

Each source returns a few hundred candidates. The union (with deduplication) is roughly 5,000 candidates. This stage is fast (50ms budget) because each source is just a database lookup or cache read.

How Each Source Works

The "follow" source is similar to Twitter timeline: pre-computed, kept in cache. New posts fan out to follower candidate pools.

The "interest" source uses tags: each post has tags (sports, politics, food). Each user has affinity scores per tag, computed offline from their past engagement. Lookup: posts in tags this user has high affinity for.

The "discovery" source is more interesting. It is itself an ML model (a smaller two-tower retrieval model). User and post are both embedded into a vector space. Candidates are the nearest posts to the user vector. Vector databases (FAISS, Vertex Matching Engine) make this fast.

Step 5: Stage 2 — Ranking

Now you have 5,000 candidates. The ranking model scores each one and you keep the top ~200.

The score predicts engagement. Specifically, it estimates probability of each engagement type (click, like, comment, share, dwell time, follow). These probabilities are combined into a single score with weights.

The Model

Production rankers are deep neural networks. Typical architecture:

An embedding lookup for categorical features (user_id, post_author_id, language, country).
Numerical feature normalization for things like post age, engagement count, dwell time.
Several dense layers, often with attention over user history.
Multi-task heads predicting different engagement signals (one head for like, one for comment, etc.).
A final scoring layer combining the heads.

Different platforms use different architectures. The shared trait: hundreds of millions to billions of parameters, trained on enormous engagement logs.

Features

The model takes hundreds of features per (user, candidate) pair. Categories:

User features: demographics (with privacy constraints), language, device, recent activity patterns, long-term interests.
Item features: author, content type (image/video/text), age in seconds, engagement counts so far, topic tags.
Author features: their relationship to the user (friends, following, never seen), their historical engagement rate.
Context features: time of day, device type, network speed, current session length, what was just consumed.
Cross features: "user has interacted with this author N times in past 30 days," "user has liked similar content before."

The Latency Problem

Scoring 5,000 candidates with a billion-parameter model in under 150ms is hard. Typical approaches:

Distilled models: a smaller, faster model trained to mimic the big model. Used for ranking under latency. The big model trains the distilled one offline.
Two-tower architecture: separate user and item towers. The user embedding is computed once per request; item embeddings are pre-computed offline. Final scoring is a dot product. Very cheap.
Caching candidate scores: for items that haven't changed, cache the score per (user_segment, post). Recompute periodically.
Hardware acceleration: GPUs and specialized inference chips.

Step 6: The Feature Store

Computing features at request time would be way too slow. Joining a user with all their historical engagement to compute "how often have they liked posts from this author?" cannot happen in 150ms.

The fix: a feature store. A specialized database that pre-computes features and serves them at low latency.

Two types of features:

Batch features: long-term aggregates computed offline (overnight). User's lifetime engagement count per topic. Author's average like rate. Updated daily.
Streaming features: recent aggregates updated in near-real-time. "Posts user liked in the last hour." Updated within seconds via a Kafka pipeline.

Storage: usually a fast key-value store (Redis, custom) for online serving, paired with a data warehouse for the batch computation. Updates flow from training pipelines into the feature store on a schedule.

The Feature Store is the most expensive piece of infrastructure in many ML serving systems. It is read on every feed request, billions of times per day.

Step 7: Stage 3 — Re-Ranking

The top 200 from the ranker often have problems:

10 of the top 50 might be from the same author.
The top score might come from clickbait that the model loves but users complain about later.
Ads need to be inserted at certain positions per business rules.
The user has already seen some posts (don't show them again).

Re-ranking applies post-hoc rules:

Diversity: at most 3 posts per author in the top 50. Spread by content type.
Deduplication: if the same news story appears as posts from 5 different sources, keep one.
Recency boost: very fresh posts get a small score boost regardless of model output.
Already-seen filter: posts the user already engaged with (or scrolled past) are demoted or removed.
Ad insertion: ads inserted at fixed positions or per the ad-ranking system's recommendations.
Quality filters: remove low-quality content, NSFW (if user opts out), spam.

Re-ranking is fast (50ms budget) because it operates on only 200 items.

Step 8: The Full Architecture

News Feed Architecture

Request

Mobile / Web
requests feed

via

Edge

CDN + LB

Feed API

orchestrates

Pipeline

Candidate Gen
5000 items

Ranker
200 top scored

Re-ranker
50 final

reads

Stores

Feature Store

Posts DB

User Profile / Affinity

ML Model Cache

events flow into

Loop

Engagement Stream

Stream Processor
updates streaming features

Training Pipeline
retrains ranker

Components Explained

Feed API orchestrates the pipeline for a single request. It calls candidate generation, ranking, re-ranking in sequence and returns the result.

Candidate Gen calls multiple sources in parallel, deduplicates, returns ~5000 candidates.

Ranker hosts the ML model. Takes user features and 5000 candidates, computes a score per candidate, returns top 200.

Re-ranker applies business rules and diversity, returns top 50.

Feature Store serves user, item, author, and cross features at low latency.

Posts DB stores the actual post content. Hydration happens at the end (full text, images, author info) before returning to the client.

Engagement Stream captures every user interaction. Feeds back into streaming features and the offline training data.

Training Pipeline retrains the ranker daily or hourly using the latest engagement data. New model versions get deployed via shadow traffic, A/B tests, then full rollout.

Step 9: Latency Budget

Total feed load: 500ms p99. Breakdown:

Network round trip + serialization: ~150ms (mostly outside our control)
Candidate generation: ~50ms (parallel calls to multiple sources)
Feature lookup: ~100ms (one batch call to the feature store)
Ranking model inference: ~150ms (the bottleneck)
Re-ranking + business logic: ~30ms
Hydration: ~20ms (fetch post bodies)

If the ranking model takes 200ms instead of 150, the feed feels laggy. So model serving is heavily optimized: GPU inference, smaller distilled models, request batching, caching of intermediate computations.

Step 10: The Engagement Loop

Every interaction the user has with the feed (like, click, comment, share, dwell, scroll past) is logged to the Engagement Stream. From there it flows into multiple consumers:

Real-time feature updates: "user just liked 3 cooking posts" gets reflected in their streaming features within seconds. Their next feed load reflects this.
Author-side metrics: the author sees their post's engagement update.
Offline training data: the daily training run consumes weeks or months of engagement events to retrain the ranker.
Anomaly detection: if an account suddenly gets 100x normal engagement, flag it as bot behavior.

This is what makes the feed personalize over time. Every user action subtly shifts what the next request shows them.

Step 11: The Cold Start Problem

A user just signed up. They have no engagement history. The model has nothing to personalize on. What do you show them?

Several approaches, often combined:

Demographic defaults: show content popular in their country, language, age group.
Onboarding selections: ask them to pick interests during signup. Use those as initial signals.
Generic trending: universally popular content, broad appeal.
Aggressive exploration: show diverse content to learn what they engage with quickly.
Social bootstrapping: if they connect a contact list, use friends' tastes to bootstrap.

After a few sessions of engagement, the user's profile is rich enough to use the regular ranking pipeline. The transition happens gradually.

Step 12: Freshness vs Personalization

A friend just posted a photo. You should see it within seconds. But your "best score" might come from a 6-hour-old post that has accumulated more engagement and looks better to the model.

The system handles this by introducing a freshness boost:

Very recent posts (under 1 hour old) get a multiplier on their score. The boost decays.
The candidate generation step ensures recent posts always make it into the candidate pool, even before the model has had time to score them well.
Streaming features update author-engagement-rate in near-real-time, so even if a post is new, signals about how others are engaging with it factor into its score.

Tuning this is hard. Too much freshness boost and the feed feels noisy. Too little and friends say "I never see your posts."

Step 13: Beyond Friends — Recommendations

Modern feeds are not just "things from my friends." They include recommended content from people you don't follow. TikTok's For You feed is almost entirely recommendations. Instagram's Reels feed too. This is recommendation, not social.

The recommendation pipeline is similar:

1. Candidate generation from a much wider universe (all public posts, not just network).
2. Ranker scores them, with stronger emphasis on long-term satisfaction (vs short-term engagement, which can favor clickbait).
3. Re-ranker handles diversity and exploration.

The trickier part: balancing "show similar content" (which makes the feed predictable and boring) with "show new things" (which risks irrelevant content). This is the explore/exploit problem from reinforcement learning.

Step 14: Edge Cases and Operational Concerns

Filter Bubbles and Echo Chambers

Pure engagement optimization can lead to bubbles. Users see only content they already agree with. Mature systems have explicit diversity injectors that occasionally show out-of-network or differing-perspective content. The amount and quality of this is a contested ethical question.

Sensitive Content

Mental health content, election misinformation, hate speech: separate classifiers run on every candidate post. Flagged content is suppressed or removed. A whole infrastructure exists for this.

A/B Testing the Ranker

New model versions are tested via gradual rollout. 1% of users get the new model, the rest the old. Engagement metrics are compared. If the new model wins on long-term satisfaction (not just short-term clicks), it gets promoted.

Caching Friend Posts

For the social part of the feed, the same fan-out logic from Twitter applies. Pre-computed timelines per user, kept in Redis. The feed pipeline uses these as one of its candidate sources.

Performance Regressions

Adding a new feature to the model adds latency. Adding a new candidate source adds candidates. Both can push latency past the budget. Mature teams have automated latency regression tests on every code change.

Step 15: Recap of Key Decisions

Three-stage funnel. Cheap candidate generation, expensive ranking, post-hoc re-ranking. Each stage works on smaller input.
Feature store is the bottleneck. Pre-computed features served at low latency. Most engineering time goes here.
Distilled models for serving, big models for training. The big model teaches the small model. Small model serves under latency.
Engagement loop closes back into training. Every interaction shapes the next model version.
Cold start solved with onboarding + trending. Until the user has signal, lean on broad popularity.
Freshness vs ranking balanced via boosts. Recent posts get a small lift to make it through the ranker.
Re-ranking handles all the "make it sane" rules. Diversity, dedup, ad insertion, already-seen filtering.

The One Thing to Remember

A modern news feed is a recommendation system, not a list of posts. The architecture exists to deliver an ML pipeline (candidate gen, ranker, re-ranker) to every user in milliseconds. The hard parts are not storage or fan-out (those are well-understood from systems like Twitter's). The hard parts are: getting the model good enough to predict long-term satisfaction (not just short-term clicks); the feature store fast enough to feed it at billions of QPS; the engagement loop fast enough that personalization adapts within seconds; and the re-ranking smart enough to override the model when it goes off the rails. The product feels magical because hundreds of engineers spent years tuning these pieces to make a few-hundred-millisecond decision feel personal.