Why This Problem Matters
You have probably used a URL shortener without thinking about it. You click a link on Twitter that looks like t.co/AbCdEf. Behind the scenes, that tiny string redirects you to some long URL that might be 200 characters of nonsense. Bit.ly, TinyURL, and YouTube's youtu.be all do the same thing.
On the surface, this seems trivial. Take a long string, give back a short string, redirect when asked. How hard could it be?
The answer: surprisingly hard, once you get serious about scale. Bit.ly handles billions of redirects per month. Twitter's t.co handles even more. The difference between a toy URL shortener and one that actually works at scale comes down to dozens of design decisions about encoding, storage, caching, and operations.
This is also why the URL shortener is the all-time classic system design interview question. Almost every important distributed systems concept shows up: hashing, distributed ID generation, read-heavy caching, sharding, analytics pipelines, and rate limiting. Working through this design teaches you how to think about real systems.
Let us build one from scratch.
Step 1: Gather Requirements
Before drawing a single box, you must agree on what the system should do (functional requirements) and how well it should do it (non-functional requirements). Skipping this step is the most common reason designs fall apart later.
Functional Requirements
bit.ly/my-talk).Non-Functional Requirements
Low latency: redirects must be near-instant (under 100ms p99). Users will not wait for a redirect.
High availability: 99.99% uptime or better. A short link that does not work is worse than no short link at all because it breaks the user's content.
Highly scalable: handle billions of reads per month, growing every year.
Durable: once a short URL is created, it must work forever (or until explicitly expired).
Read-heavy: reads vastly outnumber writes. A typical ratio is 10:1 or even 100:1.
Step 2: Estimate Capacity
Numbers shape architecture. A system serving 100 requests per second is very different from one serving 100,000. Let us pick concrete numbers and use them throughout.
Assumptions:
100 million new URLs created per month.
Read-to-write ratio: 100 to 1.
Data retention: 5 years before optional cleanup.
These are not magic numbers. They are starting points. The real value comes from updating them as you make choices and seeing where bottlenecks appear.
Step 3: High-Level Architecture
Before drilling into details, sketch the boxes and arrows. Here is the simplest version that handles all the core requirements:
POST /shorten
GET /:code
hot URLs
sharded
distributed
Kafka
ClickHouse
Each layer has a clear responsibility. The next sections drill into the interesting decisions inside each one.
Step 4: How to Generate the Short Code
This is the heart of a URL shortener. You need a way to generate short, unique strings that map to long URLs. The technique you pick affects collision rates, scalability, predictability, and security.
Why Base62?
The short code uses characters from [0-9a-zA-Z]. That is 62 distinct characters per position. So:
For our 6 billion URL projection, 7 characters is plenty with massive room to grow. We will commit to 7-character codes.
Approach 1: Hash the URL
Take the long URL, hash it with MD5 or SHA-256, then convert the first N bits to Base62.
def hash_to_short(long_url: str) -> str:
md5_bytes = hashlib.md5(long_url.encode()).digest()
big_int = int.from_bytes(md5_bytes[:6], 'big')
return to_base62(big_int)[:7]
Problem: hashes have collisions. Different long URLs can produce the same short code. You have to detect collisions and re-hash with a salt, which adds complexity and unpredictable retries.
Bigger problem: if the same long URL is shortened twice by two different users, both get the same short code by default. That sounds nice but it leaks information (anyone can check if a URL has been shortened) and breaks per-user analytics.
Approach 2: Counter + Base62 (Recommended)
Use a globally-unique incrementing integer ID for every new URL. Encode the ID in Base62.
def counter_to_short(unique_id: int) -> str:
chars = "0123456789abcdefghijklmnopqrstuvwxyz" \
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
result = ""
while unique_id > 0:
result = chars[unique_id % 62] + result
unique_id //= 62
return result.rjust(7, '0')
This approach has zero collisions by design. ID 1 maps to 0000001, ID 100 maps to 00001C, and so on. Every new URL gets a fresh ID, so codes are guaranteed unique.
Trade-off: codes are sequential and predictable. If 0000001 exists, attackers can guess 0000002 exists too. To prevent this, you can either:
Skip ahead by random amounts (use only every 10th or 100th ID).
Multiply IDs by a large prime modulo 627 to scatter them across the space.
Add a few random characters at fixed positions in the code.
For most cases, sequential is fine. If you need unguessable codes, randomize.
Approach 3: Pure Random
Generate 7 random Base62 characters and check if they exist. If yes, retry.
Simple but probabilistic. Works well when the space is mostly empty (early on). Becomes painful when the space starts to fill up because retries grow.
Verdict: for production, use the counter approach with optional scrambling. It is deterministic, fast, and collision-free.
Step 5: Generating IDs at Scale
The counter approach works perfectly with one server. But "give me the next ID" becomes a bottleneck when you have many servers writing simultaneously. You cannot just increment a single global counter without serializing everything.
Option A: Single Auto-Increment Database
The simplest. Have one MySQL or Postgres instance own the counter. Every API server requests the next ID from there.
Limit: the database becomes the chokepoint. Single point of failure. Tops out at maybe 10,000 IDs per second under good conditions.
Option B: Range Allocation (Token Server)
Each API server requests a chunk of IDs (say, 1000 at a time) from a central allocator. It uses them locally without further coordination. When the chunk is exhausted, it requests the next.
This reduces load on the central allocator by 1000x. The trade-off is that if a server crashes mid-range, those IDs are wasted.
Option C: Snowflake IDs
Originally from Twitter. A 64-bit ID composed of:
41 bits for timestamp (in milliseconds since some epoch).
10 bits for machine ID.
12 bits for a per-machine sequence counter.
Each machine generates its own IDs without talking to anyone else. Guaranteed unique because machine IDs do not overlap. Roughly time-ordered, which is sometimes useful.
The catch for our use case: 64-bit IDs are larger than we need. They produce 11-character Base62 codes, not 7. If short codes are critical, range allocation is better.
Step 6: Database Design
The schema is simple. The interesting part is choosing which database, and how to scale it.
short_codelong_urluser_idcreated_atexpires_atclick_countSQL or NoSQL?
Both can work. The decision comes down to access patterns.
- Strong consistency. Transactions work cleanly.
- Great for relational queries (user's links, expiration cleanup).
- Mature, well-understood operationally.
- Sharding requires manual work.
- Built-in horizontal scaling.
- Predictable latency at any scale.
- Eventual consistency in some configs.
- Joins and complex queries are painful.
For a URL shortener, the access pattern is dead simple: lookup by short_code. NoSQL is a great fit. DynamoDB or Cassandra both excel at this. If you need relational queries (analytics, user dashboard), pair it with a separate SQL database for those.
Sharding Strategy
3 TB of data is too much for a single node. You need to shard.
Shard by short_code (hash-based): the short_code is hashed and the hash determines which shard. Even distribution. Lookups are still O(1) (compute hash, route to shard). This is the right answer for URL shorteners.
Shard by user_id: all of one user's links live on one shard. Good for "my links" queries. Bad for the dominant access pattern (anonymous redirects), since you would not know the user from the short code.
Shard by time: recent links on one shard, old links on another. Tempting but creates hot shards (most reads hit the latest data) and cold shards (old data nobody uses).
Stick with hash-based sharding by short_code.
Step 7: The Cache Layer
Reads outnumber writes 100:1. If every read hits the database, you need 100x more database capacity than you would otherwise. The cache fixes this.
The strategy is straightforward: when a redirect comes in, check Redis first. On a hit, return immediately. On a miss, read from the database, store in Redis, return. This is the Cache Aside pattern, the same one covered in detail in the Caching Strategies article.
What to Cache
The hot 20% of URLs probably account for 80% of clicks (Pareto distribution). Cache them aggressively. The cold long tail goes to the database. With our 600 GB cache estimate, we hold most of what users actually click.
The Thundering Herd Problem
A celebrity tweets a shortened URL. Suddenly 1 million people click it within 30 seconds. The link is not in cache yet (cold start). Every single request misses the cache and hits the database. The database falls over.
overwhelmed
The fix is called request coalescing or singleflight. The cache layer recognizes concurrent misses for the same key and only sends one request to the backend. Languages and frameworks (Go's singleflight, Caffeine in Java) provide this out of the box.
Step 8: API Design
Two endpoints carry most of the load. Keep them simple.
POST /api/v1/shorten
Content-Type: application/json
Authorization: Bearer <token>
{
"long_url": "https://example.com/some/very/long/url",
"custom_alias": "my-talk", // optional
"expires_at": "2027-01-01T00:00:00Z" // optional
}
Response 201 Created:
{
"short_url": "https://mab.az/my-talk",
"short_code": "my-talk",
"expires_at": "2027-01-01T00:00:00Z"
}
GET /:short_code
Response 301 Moved Permanently:
Location: https://example.com/some/very/long/url
OR Response 302 Found:
Location: https://example.com/some/very/long/url
301 vs 302
This decision matters more than people think.
301 (Permanent): browsers cache the redirect aggressively. Future clicks may not even reach your server. Saves load. Bad for analytics (you stop seeing clicks).
302 (Found): browsers do not cache. Every click reaches your server. Higher load but accurate analytics.
Most URL shorteners use 302 because click tracking is a core product feature. If you do not need analytics, 301 is more efficient.
Step 9: Analytics Pipeline
Every redirect should generate a click event. But updating click_count synchronously on every redirect is wasteful. It puts write load on the URL table that should be read-only.
The right approach: emit click events asynchronously to a queue, process them separately, store in a dedicated analytics database.
responds with 302
click_events topic
batches counts
adds country, city
analytical store
The redirect path stays fast and lightweight. The analytics pipeline runs at its own pace. Click counts in the dashboard are eventually consistent (which is fine, see the Eventual Consistency Patterns article).
ClickHouse is the typical choice for this workload because it crushes aggregation queries on event data. Snowflake, BigQuery, or Druid also work.
Step 10: Rate Limiting and Abuse Prevention
URL shorteners are heavily abused for phishing, spam, and malware distribution. Without controls, your service becomes a tool for attackers and gets blacklisted by browsers.
Per-User Rate Limits
Cap how many URLs a user can create per minute, hour, day. Authenticated users get higher limits. Anonymous users get strict limits. Suspicious patterns trigger captchas.
Implementation: a Redis-backed token bucket per user ID or IP. Each shorten request consumes a token. When tokens run out, return 429 Too Many Requests.
Malicious URL Detection
Before shortening, scan the long URL against known threat databases:
Google Safe Browsing API: the de facto standard. Free for moderate traffic.
VirusTotal: aggregates reports from many threat scanners.
Internal blocklists: domains that have been reported for abuse on your service.
Reject the request if the URL is on any list. Periodically rescan existing links and disable any that turn malicious after the fact.
CAPTCHA on Suspicious Patterns
If one IP creates 100 URLs in an hour, that is suspicious even within rate limits. Trigger a CAPTCHA before allowing more. Real users pass. Bots usually do not.
Step 11: Custom Aliases and Collisions
Users want to pick custom short codes (e.g., bit.ly/their-product). This adds a few wrinkles.
When a user requests a custom alias, the system checks if it exists. If yes, reject. If no, write it (with a uniqueness constraint at the database level so two simultaneous requests for the same alias cannot both succeed).
def create_with_custom_alias(alias, long_url, user_id):
try:
db.insert(
short_code=alias,
long_url=long_url,
user_id=user_id,
)
except UniqueConstraintViolation:
raise ConflictError(f"Alias '{alias}' is taken")
You also need a reserved word list: aliases like admin, api, login, about should never be allowed because they collide with site routes. Block them upfront.
If you allow alias reuse after deletion, set a cool-down period so phishers cannot grab a recently-deleted trusted alias.
Step 12: Edge Cases and Operational Concerns
URL Validation
Reject invalid URLs upfront: malformed URIs, IP-only URLs (often used for malware), URLs pointing back at your own domain (redirect loops), URLs longer than some sane limit (10KB or so).
Expired Links
A daily background job scans for expired links and either deletes them or moves them to an "expired" state that returns a 410 Gone. Do not block the redirect path with expiration logic; check it lazily on access.
Global Latency
Users in Tokyo should not wait for a redirect to come from a US server. Put your read API behind a CDN like Cloudflare or CloudFront with edge caching enabled. The CDN can cache 302 responses with a short TTL (a few minutes), serving most redirects from the user's nearest edge node.
Backups and Disaster Recovery
The dataset is small (3 TB) and immutable in nature (rows are mostly added, rarely changed). Daily snapshots to object storage are sufficient. Keep at least 30 days of point-in-time backups.
Multi-Region Deployment
For very high availability, run the system in multiple regions (US East, EU, Asia). Read replicas in each region for the URL database. Write traffic still goes to a single primary region (writes are rare). Failover plan must be tested.
The Complete Architecture
Putting all the pieces together, here is the final picture:
Cloudflare
shorten, custom alias
redirect
token bucket in Redis
hot URL cache
ID ranges
sharded by code
click events
enrich + aggregate
analytics
daily
Safe Browsing
nightly snapshots
Key Design Decisions Recap
Looking back, the design hinges on a few key choices:
Counter-based encoding with Base62, 7 characters. Collision-free, scales to trillions.
Range allocation for distributed IDs. Avoids the bottleneck of a single counter.
NoSQL with hash sharding by short_code. Right fit for the dominant access pattern.
Aggressive caching with request coalescing. Handles read-heavy load without melting the database.
Async analytics through Kafka. Keeps the redirect path fast.
CDN at the edge. Low latency globally.
Rate limiting and threat scanning. Keeps abusers out.
The One Thing to Remember
A URL shortener looks like a hash table behind an API. At small scale, it is exactly that. At Twitter scale, every layer becomes interesting: the encoding affects predictability, the ID generator affects throughput, the cache affects database load, the queue affects analytics latency, and the rate limiter affects whether your service is usable for anyone but spammers.
The lesson is broader than this one system. Most "simple" services look simple because someone designed them well. Behind every clean API is a stack of careful choices about consistency, partitioning, caching, and failure modes. Knowing how to make those choices is the actual skill of system design.
If you can build a URL shortener that handles 12,000 reads per second, never collides, survives a thundering herd, blocks malicious URLs, and stays fast under viral spikes, you have a working model for a hundred other systems. Almost everything else is the same patterns rearranged.