Why This Problem Matters

You have probably used a URL shortener without thinking about it. You click a link on Twitter that looks like t.co/AbCdEf. Behind the scenes, that tiny string redirects you to some long URL that might be 200 characters of nonsense. Bit.ly, TinyURL, and YouTube's youtu.be all do the same thing.

On the surface, this seems trivial. Take a long string, give back a short string, redirect when asked. How hard could it be?

The answer: surprisingly hard, once you get serious about scale. Bit.ly handles billions of redirects per month. Twitter's t.co handles even more. The difference between a toy URL shortener and one that actually works at scale comes down to dozens of design decisions about encoding, storage, caching, and operations.

This is also why the URL shortener is the all-time classic system design interview question. Almost every important distributed systems concept shows up: hashing, distributed ID generation, read-heavy caching, sharding, analytics pipelines, and rate limiting. Working through this design teaches you how to think about real systems.

Let us build one from scratch.

Step 1: Gather Requirements

Before drawing a single box, you must agree on what the system should do (functional requirements) and how well it should do it (non-functional requirements). Skipping this step is the most common reason designs fall apart later.

Functional Requirements

Shorten URL
Accept a long URL, return a unique short URL.
Redirect
When the short URL is hit, redirect (HTTP 301 or 302) to the original long URL.
Custom Alias
Allow users to pick their own short code (e.g., bit.ly/my-talk).
Expiration
Links can have an optional expiration date.
Analytics
Track clicks: count, geography, device, referrer.
Account
Authenticated users own their links and can edit or delete them.

Non-Functional Requirements

Low latency: redirects must be near-instant (under 100ms p99). Users will not wait for a redirect.
High availability: 99.99% uptime or better. A short link that does not work is worse than no short link at all because it breaks the user's content.
Highly scalable: handle billions of reads per month, growing every year.
Durable: once a short URL is created, it must work forever (or until explicitly expired).
Read-heavy: reads vastly outnumber writes. A typical ratio is 10:1 or even 100:1.

Step 2: Estimate Capacity

Numbers shape architecture. A system serving 100 requests per second is very different from one serving 100,000. Let us pick concrete numbers and use them throughout.

Assumptions:

100 million new URLs created per month.
Read-to-write ratio: 100 to 1.
Data retention: 5 years before optional cleanup.

Metric
Calculation
Result
Write QPS
100M / (30 days × 24h × 3600s)
~40 writes/sec
Read QPS
40 × 100
~4,000 reads/sec
Peak Read QPS
average × 3 (rough peak factor)
~12,000 reads/sec
Total URLs (5 yrs)
100M × 12 × 5
~6 billion URLs
Storage
6B × 500 bytes per record
~3 TB
Cache size (hot 20%)
3 TB × 0.2
~600 GB

These are not magic numbers. They are starting points. The real value comes from updating them as you make choices and seeing where bottlenecks appear.

Step 3: High-Level Architecture

Before drilling into details, sketch the boxes and arrows. Here is the simplest version that handles all the core requirements:

High-Level Architecture
Client
Browser / App
HTTPS
Edge
Load Balancer + CDN
routes
API Layer
Write API
POST /shorten
Read API
GET /:code
checks
Cache Layer
Redis
hot URLs
on miss
Storage
URL Database
sharded
ID Generator
distributed
async events
Analytics
Event Queue
Kafka
Analytics DB
ClickHouse

Each layer has a clear responsibility. The next sections drill into the interesting decisions inside each one.

Step 4: How to Generate the Short Code

This is the heart of a URL shortener. You need a way to generate short, unique strings that map to long URLs. The technique you pick affects collision rates, scalability, predictability, and security.

Why Base62?

The short code uses characters from [0-9a-zA-Z]. That is 62 distinct characters per position. So:

How Many URLs Fit in N Characters
Length
Combinations
Roughly
5
625 = 916 million
enough for a small product
6
626 = 56 billion
comfortable for years
8
628 = 218 trillion
essentially infinite

For our 6 billion URL projection, 7 characters is plenty with massive room to grow. We will commit to 7-character codes.

Approach 1: Hash the URL

Take the long URL, hash it with MD5 or SHA-256, then convert the first N bits to Base62.

def hash_to_short(long_url: str) -> str:
    md5_bytes = hashlib.md5(long_url.encode()).digest()
    big_int = int.from_bytes(md5_bytes[:6], 'big')
    return to_base62(big_int)[:7]

Problem: hashes have collisions. Different long URLs can produce the same short code. You have to detect collisions and re-hash with a salt, which adds complexity and unpredictable retries.

Bigger problem: if the same long URL is shortened twice by two different users, both get the same short code by default. That sounds nice but it leaks information (anyone can check if a URL has been shortened) and breaks per-user analytics.

Approach 2: Counter + Base62 (Recommended)

Use a globally-unique incrementing integer ID for every new URL. Encode the ID in Base62.

def counter_to_short(unique_id: int) -> str:
    chars = "0123456789abcdefghijklmnopqrstuvwxyz" \
            "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = ""
    while unique_id > 0:
        result = chars[unique_id % 62] + result
        unique_id //= 62
    return result.rjust(7, '0')

This approach has zero collisions by design. ID 1 maps to 0000001, ID 100 maps to 00001C, and so on. Every new URL gets a fresh ID, so codes are guaranteed unique.

Trade-off: codes are sequential and predictable. If 0000001 exists, attackers can guess 0000002 exists too. To prevent this, you can either:

Skip ahead by random amounts (use only every 10th or 100th ID).
Multiply IDs by a large prime modulo 627 to scatter them across the space.
Add a few random characters at fixed positions in the code.

For most cases, sequential is fine. If you need unguessable codes, randomize.

Approach 3: Pure Random

Generate 7 random Base62 characters and check if they exist. If yes, retry.

Simple but probabilistic. Works well when the space is mostly empty (early on). Becomes painful when the space starts to fill up because retries grow.

Verdict: for production, use the counter approach with optional scrambling. It is deterministic, fast, and collision-free.

Step 5: Generating IDs at Scale

The counter approach works perfectly with one server. But "give me the next ID" becomes a bottleneck when you have many servers writing simultaneously. You cannot just increment a single global counter without serializing everything.

Option A: Single Auto-Increment Database

The simplest. Have one MySQL or Postgres instance own the counter. Every API server requests the next ID from there.

Limit: the database becomes the chokepoint. Single point of failure. Tops out at maybe 10,000 IDs per second under good conditions.

Option B: Range Allocation (Token Server)

Each API server requests a chunk of IDs (say, 1000 at a time) from a central allocator. It uses them locally without further coordination. When the chunk is exhausted, it requests the next.

Range Allocation
API Server A
"give me a range"
Token Server
"yours: 1000-1999"
API Server A
Each server gets a private range. No coordination needed for individual IDs. Token server only sees 1 request per 1000 URLs.

This reduces load on the central allocator by 1000x. The trade-off is that if a server crashes mid-range, those IDs are wasted.

Option C: Snowflake IDs

Originally from Twitter. A 64-bit ID composed of:

41 bits for timestamp (in milliseconds since some epoch).
10 bits for machine ID.
12 bits for a per-machine sequence counter.

Each machine generates its own IDs without talking to anyone else. Guaranteed unique because machine IDs do not overlap. Roughly time-ordered, which is sometimes useful.

The catch for our use case: 64-bit IDs are larger than we need. They produce 11-character Base62 codes, not 7. If short codes are critical, range allocation is better.

Step 6: Database Design

The schema is simple. The interesting part is choosing which database, and how to scale it.

URL Table Schema
Column
Type
Notes
short_code
VARCHAR(10)
Primary key. Indexed for O(1) lookups.
long_url
TEXT
The full original URL.
user_id
BIGINT
Owner. Indexed for "my links" queries.
created_at
TIMESTAMP
When the link was created.
expires_at
TIMESTAMP
Nullable. NULL means no expiration.
click_count
BIGINT
Updated asynchronously, eventually consistent.

SQL or NoSQL?

Both can work. The decision comes down to access patterns.

SQL (PostgreSQL, MySQL)
  • Strong consistency. Transactions work cleanly.
  • Great for relational queries (user's links, expiration cleanup).
  • Mature, well-understood operationally.
  • Sharding requires manual work.
NoSQL (DynamoDB, Cassandra)
  • Built-in horizontal scaling.
  • Predictable latency at any scale.
  • Eventual consistency in some configs.
  • Joins and complex queries are painful.

For a URL shortener, the access pattern is dead simple: lookup by short_code. NoSQL is a great fit. DynamoDB or Cassandra both excel at this. If you need relational queries (analytics, user dashboard), pair it with a separate SQL database for those.

Sharding Strategy

3 TB of data is too much for a single node. You need to shard.

Shard by short_code (hash-based): the short_code is hashed and the hash determines which shard. Even distribution. Lookups are still O(1) (compute hash, route to shard). This is the right answer for URL shorteners.

Shard by user_id: all of one user's links live on one shard. Good for "my links" queries. Bad for the dominant access pattern (anonymous redirects), since you would not know the user from the short code.

Shard by time: recent links on one shard, old links on another. Tempting but creates hot shards (most reads hit the latest data) and cold shards (old data nobody uses).

Stick with hash-based sharding by short_code.

Step 7: The Cache Layer

Reads outnumber writes 100:1. If every read hits the database, you need 100x more database capacity than you would otherwise. The cache fixes this.

The strategy is straightforward: when a redirect comes in, check Redis first. On a hit, return immediately. On a miss, read from the database, store in Redis, return. This is the Cache Aside pattern, the same one covered in detail in the Caching Strategies article.

What to Cache

The hot 20% of URLs probably account for 80% of clicks (Pareto distribution). Cache them aggressively. The cold long tail goes to the database. With our 600 GB cache estimate, we hold most of what users actually click.

The Thundering Herd Problem

A celebrity tweets a shortened URL. Suddenly 1 million people click it within 30 seconds. The link is not in cache yet (cold start). Every single request misses the cache and hits the database. The database falls over.

Thundering Herd on a Viral Link
1M concurrent clients
all request
Cache miss
all hit
Database
overwhelmed
Fix: request coalescing. The first request fetches from DB and populates cache. Other concurrent requests for the same key wait briefly and read from the freshly populated cache. Only one DB call instead of a million.

The fix is called request coalescing or singleflight. The cache layer recognizes concurrent misses for the same key and only sends one request to the backend. Languages and frameworks (Go's singleflight, Caffeine in Java) provide this out of the box.

Step 8: API Design

Two endpoints carry most of the load. Keep them simple.

POST /api/v1/shorten
Content-Type: application/json
Authorization: Bearer <token>

{
  "long_url": "https://example.com/some/very/long/url",
  "custom_alias": "my-talk",       // optional
  "expires_at": "2027-01-01T00:00:00Z"  // optional
}

Response 201 Created:
{
  "short_url": "https://mab.az/my-talk",
  "short_code": "my-talk",
  "expires_at": "2027-01-01T00:00:00Z"
}
GET /:short_code

Response 301 Moved Permanently:
Location: https://example.com/some/very/long/url

OR Response 302 Found:
Location: https://example.com/some/very/long/url

301 vs 302

This decision matters more than people think.

301 (Permanent): browsers cache the redirect aggressively. Future clicks may not even reach your server. Saves load. Bad for analytics (you stop seeing clicks).
302 (Found): browsers do not cache. Every click reaches your server. Higher load but accurate analytics.

Most URL shorteners use 302 because click tracking is a core product feature. If you do not need analytics, 301 is more efficient.

Step 9: Analytics Pipeline

Every redirect should generate a click event. But updating click_count synchronously on every redirect is wasteful. It puts write load on the URL table that should be read-only.

The right approach: emit click events asynchronously to a queue, process them separately, store in a dedicated analytics database.

Async Analytics Pipeline
Read API
responds with 302
async event
Kafka
click_events topic
consumed by
Aggregator
batches counts
GeoIP Enricher
adds country, city
writes to
ClickHouse
analytical store

The redirect path stays fast and lightweight. The analytics pipeline runs at its own pace. Click counts in the dashboard are eventually consistent (which is fine, see the Eventual Consistency Patterns article).

ClickHouse is the typical choice for this workload because it crushes aggregation queries on event data. Snowflake, BigQuery, or Druid also work.

Step 10: Rate Limiting and Abuse Prevention

URL shorteners are heavily abused for phishing, spam, and malware distribution. Without controls, your service becomes a tool for attackers and gets blacklisted by browsers.

Per-User Rate Limits

Cap how many URLs a user can create per minute, hour, day. Authenticated users get higher limits. Anonymous users get strict limits. Suspicious patterns trigger captchas.

Implementation: a Redis-backed token bucket per user ID or IP. Each shorten request consumes a token. When tokens run out, return 429 Too Many Requests.

Malicious URL Detection

Before shortening, scan the long URL against known threat databases:

Google Safe Browsing API: the de facto standard. Free for moderate traffic.
VirusTotal: aggregates reports from many threat scanners.
Internal blocklists: domains that have been reported for abuse on your service.

Reject the request if the URL is on any list. Periodically rescan existing links and disable any that turn malicious after the fact.

CAPTCHA on Suspicious Patterns

If one IP creates 100 URLs in an hour, that is suspicious even within rate limits. Trigger a CAPTCHA before allowing more. Real users pass. Bots usually do not.

Step 11: Custom Aliases and Collisions

Users want to pick custom short codes (e.g., bit.ly/their-product). This adds a few wrinkles.

When a user requests a custom alias, the system checks if it exists. If yes, reject. If no, write it (with a uniqueness constraint at the database level so two simultaneous requests for the same alias cannot both succeed).

def create_with_custom_alias(alias, long_url, user_id):
    try:
        db.insert(
            short_code=alias,
            long_url=long_url,
            user_id=user_id,
        )
    except UniqueConstraintViolation:
        raise ConflictError(f"Alias '{alias}' is taken")

You also need a reserved word list: aliases like admin, api, login, about should never be allowed because they collide with site routes. Block them upfront.

If you allow alias reuse after deletion, set a cool-down period so phishers cannot grab a recently-deleted trusted alias.

Step 12: Edge Cases and Operational Concerns

URL Validation

Reject invalid URLs upfront: malformed URIs, IP-only URLs (often used for malware), URLs pointing back at your own domain (redirect loops), URLs longer than some sane limit (10KB or so).

Expired Links

A daily background job scans for expired links and either deletes them or moves them to an "expired" state that returns a 410 Gone. Do not block the redirect path with expiration logic; check it lazily on access.

Global Latency

Users in Tokyo should not wait for a redirect to come from a US server. Put your read API behind a CDN like Cloudflare or CloudFront with edge caching enabled. The CDN can cache 302 responses with a short TTL (a few minutes), serving most redirects from the user's nearest edge node.

Backups and Disaster Recovery

The dataset is small (3 TB) and immutable in nature (rows are mostly added, rarely changed). Daily snapshots to object storage are sufficient. Keep at least 30 days of point-in-time backups.

Multi-Region Deployment

For very high availability, run the system in multiple regions (US East, EU, Asia). Read replicas in each region for the URL database. Write traffic still goes to a single primary region (writes are rare). Failover plan must be tested.

The Complete Architecture

Putting all the pieces together, here is the final picture:

Full URL Shortener Architecture
Edge
Users
CDN
Cloudflare
Load Balancer
routes
Stateless API
Write API
shorten, custom alias
Read API
redirect
Auth + Rate Limit
token bucket in Redis
checks
Stateful
Redis Cluster
hot URL cache
Token Server
ID ranges
URL DB
sharded by code
async
Pipeline
Kafka
click events
Stream Processor
enrich + aggregate
ClickHouse
analytics
side jobs
Background
Expiration Job
daily
Threat Scanner
Safe Browsing
Backup Job
nightly snapshots

Key Design Decisions Recap

Looking back, the design hinges on a few key choices:

Counter-based encoding with Base62, 7 characters. Collision-free, scales to trillions.
Range allocation for distributed IDs. Avoids the bottleneck of a single counter.
NoSQL with hash sharding by short_code. Right fit for the dominant access pattern.
Aggressive caching with request coalescing. Handles read-heavy load without melting the database.
Async analytics through Kafka. Keeps the redirect path fast.
CDN at the edge. Low latency globally.
Rate limiting and threat scanning. Keeps abusers out.

The One Thing to Remember

A URL shortener looks like a hash table behind an API. At small scale, it is exactly that. At Twitter scale, every layer becomes interesting: the encoding affects predictability, the ID generator affects throughput, the cache affects database load, the queue affects analytics latency, and the rate limiter affects whether your service is usable for anyone but spammers.

The lesson is broader than this one system. Most "simple" services look simple because someone designed them well. Behind every clean API is a stack of careful choices about consistency, partitioning, caching, and failure modes. Knowing how to make those choices is the actual skill of system design.

If you can build a URL shortener that handles 12,000 reads per second, never collides, survives a thundering herd, blocks malicious URLs, and stays fast under viral spikes, you have a working model for a hundred other systems. Almost everything else is the same patterns rearranged.