Designing a URL Shortener

Why This Problem Matters

You have probably used a URL shortener without thinking about it. You click a link on Twitter that looks like t.co/AbCdEf. Behind the scenes, that tiny string redirects you to some long URL that might be 200 characters of nonsense. Bit.ly, TinyURL, and YouTube's youtu.be all do the same thing.

On the surface, this seems trivial. Take a long string, give back a short string, redirect when asked. How hard could it be?

The answer: surprisingly hard, once you get serious about scale. Bit.ly handles billions of redirects per month. Twitter's t.co handles even more. The difference between a toy URL shortener and one that actually works at scale comes down to dozens of design decisions about encoding, storage, caching, and operations.

This is also why the URL shortener is the all-time classic system design interview question. Almost every important distributed systems concept shows up: hashing, distributed ID generation, read-heavy caching, sharding, analytics pipelines, and rate limiting. Working through this design teaches you how to think about real systems.

Let us build one from scratch.

Step 1: Gather Requirements

Before drawing a single box, you must agree on what the system should do (functional requirements) and how well it should do it (non-functional requirements). Skipping this step is the most common reason designs fall apart later.

Functional Requirements

Shorten URL

Accept a long URL, return a unique short URL.

Redirect

When the short URL is hit, redirect (HTTP 301 or 302) to the original long URL.

Custom Alias

Allow users to pick their own short code (e.g., bit.ly/my-talk).

Expiration

Links can have an optional expiration date.

Analytics

Track clicks: count, geography, device, referrer.

Account

Authenticated users own their links and can edit or delete them.

Non-Functional Requirements

Low latency: redirects must be near-instant (under 100ms p99). Users will not wait for a redirect.
High availability: 99.99% uptime or better. A short link that does not work is worse than no short link at all because it breaks the user's content.
Highly scalable: handle billions of reads per month, growing every year.
Durable: once a short URL is created, it must work forever (or until explicitly expired).
Read-heavy: reads vastly outnumber writes. A typical ratio is 10:1 or even 100:1.

Step 2: Estimate Capacity

Numbers shape architecture. A system serving 100 requests per second is very different from one serving 100,000. Let us pick concrete numbers and use them throughout.

Assumptions:

100 million new URLs created per month.
Read-to-write ratio: 100 to 1.
Data retention: 5 years before optional cleanup.

Metric

Calculation

Result

Write QPS

100M / (30 days × 24h × 3600s)

~40 writes/sec

Read QPS

40 × 100

~4,000 reads/sec

Peak Read QPS

average × 3 (rough peak factor)

~12,000 reads/sec

Total URLs (5 yrs)

100M × 12 × 5

~6 billion URLs

Storage

6B × 500 bytes per record

~3 TB

Cache size (hot 20%)

3 TB × 0.2

~600 GB

These are not magic numbers. They are starting points. The real value comes from updating them as you make choices and seeing where bottlenecks appear.

Step 3: High-Level Architecture

Before drilling into details, sketch the boxes and arrows. Here is the simplest version that handles all the core requirements:

High-Level Architecture

Client

Browser / App

HTTPS

Edge

Load Balancer + CDN

routes

API Layer

Write API
POST /shorten

Read API
GET /:code

checks

Cache Layer

Redis
hot URLs

on miss

Storage

URL Database
sharded

ID Generator
distributed

async events

Analytics

Event Queue
Kafka

Analytics DB
ClickHouse

Each layer has a clear responsibility. The next sections drill into the interesting decisions inside each one.

Step 4: How to Generate the Short Code

This is the heart of a URL shortener. You need a way to generate short, unique strings that map to long URLs. The technique you pick affects collision rates, scalability, predictability, and security.

Why Base62?

The short code uses characters from [0-9a-zA-Z]. That is 62 distinct characters per position. So:

How Many URLs Fit in N Characters

Length

Combinations

Roughly

62⁵ = 916 million

enough for a small product

62⁶ = 56 billion

comfortable for years

62⁷ = 3.5 trillion

recommended sweet spot

62⁸ = 218 trillion

essentially infinite

For our 6 billion URL projection, 7 characters is plenty with massive room to grow. We will commit to 7-character codes.

Approach 1: Hash the URL

Take the long URL, hash it with MD5 or SHA-256, then convert the first N bits to Base62.

def hash_to_short(long_url: str) -> str:
    md5_bytes = hashlib.md5(long_url.encode()).digest()
    big_int = int.from_bytes(md5_bytes[:6], 'big')
    return to_base62(big_int)[:7]

Problem: hashes have collisions. Different long URLs can produce the same short code. You have to detect collisions and re-hash with a salt, which adds complexity and unpredictable retries.

Bigger problem: if the same long URL is shortened twice by two different users, both get the same short code by default. That sounds nice but it leaks information (anyone can check if a URL has been shortened) and breaks per-user analytics.

Approach 2: Counter + Base62 (Recommended)

Use a globally-unique incrementing integer ID for every new URL. Encode the ID in Base62.

def counter_to_short(unique_id: int) -> str:
    chars = "0123456789abcdefghijklmnopqrstuvwxyz" \
            "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = ""
    while unique_id > 0:
        result = chars[unique_id % 62] + result
        unique_id //= 62
    return result.rjust(7, '0')

This approach has zero collisions by design. ID 1 maps to 0000001, ID 100 maps to 00001C, and so on. Every new URL gets a fresh ID, so codes are guaranteed unique.

Trade-off: codes are sequential and predictable. If 0000001 exists, attackers can guess 0000002 exists too. To prevent this, you can either:

Skip ahead by random amounts (use only every 10th or 100th ID).
Multiply IDs by a large prime modulo 62⁷ to scatter them across the space.
Add a few random characters at fixed positions in the code.

For most cases, sequential is fine. If you need unguessable codes, randomize.

Approach 3: Pure Random

Generate 7 random Base62 characters and check if they exist. If yes, retry.

Simple but probabilistic. Works well when the space is mostly empty (early on). Becomes painful when the space starts to fill up because retries grow.

Verdict: for production, use the counter approach with optional scrambling. It is deterministic, fast, and collision-free.

Step 5: Generating IDs at Scale

The counter approach works perfectly with one server. But "give me the next ID" becomes a bottleneck when you have many servers writing simultaneously. You cannot just increment a single global counter without serializing everything.

Option A: Single Auto-Increment Database

The simplest. Have one MySQL or Postgres instance own the counter. Every API server requests the next ID from there.

Limit: the database becomes the chokepoint. Single point of failure. Tops out at maybe 10,000 IDs per second under good conditions.

Option B: Range Allocation (Token Server)

Each API server requests a chunk of IDs (say, 1000 at a time) from a central allocator. It uses them locally without further coordination. When the chunk is exhausted, it requests the next.

Range Allocation

API Server A

"give me a range"

Token Server

"yours: 1000-1999"

API Server A

Each server gets a private range. No coordination needed for individual IDs. Token server only sees 1 request per 1000 URLs.

This reduces load on the central allocator by 1000x. The trade-off is that if a server crashes mid-range, those IDs are wasted.

Option C: Snowflake IDs

Originally from Twitter. A 64-bit ID composed of:

41 bits for timestamp (in milliseconds since some epoch).
10 bits for machine ID.
12 bits for a per-machine sequence counter.

Each machine generates its own IDs without talking to anyone else. Guaranteed unique because machine IDs do not overlap. Roughly time-ordered, which is sometimes useful.

The catch for our use case: 64-bit IDs are larger than we need. They produce 11-character Base62 codes, not 7. If short codes are critical, range allocation is better.

Step 6: Database Design

The schema is simple. The interesting part is choosing which database, and how to scale it.

URL Table Schema

Column

Type

Notes

short_code

VARCHAR(10)

Primary key. Indexed for O(1) lookups.

long_url

TEXT

The full original URL.

user_id

BIGINT

Owner. Indexed for "my links" queries.

created_at

TIMESTAMP

When the link was created.

expires_at

TIMESTAMP

Nullable. NULL means no expiration.

click_count

BIGINT

Updated asynchronously, eventually consistent.

SQL or NoSQL?

Both can work. The decision comes down to access patterns.

SQL (PostgreSQL, MySQL)

Strong consistency. Transactions work cleanly.
Great for relational queries (user's links, expiration cleanup).
Mature, well-understood operationally.
Sharding requires manual work.

NoSQL (DynamoDB, Cassandra)

Built-in horizontal scaling.
Predictable latency at any scale.
Eventual consistency in some configs.
Joins and complex queries are painful.

For a URL shortener, the access pattern is dead simple: lookup by short_code. NoSQL is a great fit. DynamoDB or Cassandra both excel at this. If you need relational queries (analytics, user dashboard), pair it with a separate SQL database for those.

Sharding Strategy

3 TB of data is too much for a single node. You need to shard.

Shard by short_code (hash-based): the short_code is hashed and the hash determines which shard. Even distribution. Lookups are still O(1) (compute hash, route to shard). This is the right answer for URL shorteners.

Shard by user_id: all of one user's links live on one shard. Good for "my links" queries. Bad for the dominant access pattern (anonymous redirects), since you would not know the user from the short code.

Shard by time: recent links on one shard, old links on another. Tempting but creates hot shards (most reads hit the latest data) and cold shards (old data nobody uses).

Stick with hash-based sharding by short_code.

Step 7: The Cache Layer

Reads outnumber writes 100:1. If every read hits the database, you need 100x more database capacity than you would otherwise. The cache fixes this.

The strategy is straightforward: when a redirect comes in, check Redis first. On a hit, return immediately. On a miss, read from the database, store in Redis, return. This is the Cache Aside pattern, the same one covered in detail in the Caching Strategies article.

What to Cache

The hot 20% of URLs probably account for 80% of clicks (Pareto distribution). Cache them aggressively. The cold long tail goes to the database. With our 600 GB cache estimate, we hold most of what users actually click.

The Thundering Herd Problem

A celebrity tweets a shortened URL. Suddenly 1 million people click it within 30 seconds. The link is not in cache yet (cold start). Every single request misses the cache and hits the database. The database falls over.

Thundering Herd on a Viral Link

1M concurrent clients

all request

Cache miss

all hit

Database
overwhelmed

Fix: request coalescing. The first request fetches from DB and populates cache. Other concurrent requests for the same key wait briefly and read from the freshly populated cache. Only one DB call instead of a million.

The fix is called request coalescing or singleflight. The cache layer recognizes concurrent misses for the same key and only sends one request to the backend. Languages and frameworks (Go's singleflight, Caffeine in Java) provide this out of the box.

Step 8: API Design

Two endpoints carry most of the load. Keep them simple.

POST /api/v1/shorten
Content-Type: application/json
Authorization: Bearer <token>

{
  "long_url": "https://example.com/some/very/long/url",
  "custom_alias": "my-talk",       // optional
  "expires_at": "2027-01-01T00:00:00Z"  // optional
}

Response 201 Created:
{
  "short_url": "https://mab.az/my-talk",
  "short_code": "my-talk",
  "expires_at": "2027-01-01T00:00:00Z"
}

GET /:short_code

Response 301 Moved Permanently:
Location: https://example.com/some/very/long/url

OR Response 302 Found:
Location: https://example.com/some/very/long/url

301 vs 302

This decision matters more than people think.

301 (Permanent): browsers cache the redirect aggressively. Future clicks may not even reach your server. Saves load. Bad for analytics (you stop seeing clicks).
302 (Found): browsers do not cache. Every click reaches your server. Higher load but accurate analytics.

Most URL shorteners use 302 because click tracking is a core product feature. If you do not need analytics, 301 is more efficient.

Step 9: Analytics Pipeline

Every redirect should generate a click event. But updating click_count synchronously on every redirect is wasteful. It puts write load on the URL table that should be read-only.

The right approach: emit click events asynchronously to a queue, process them separately, store in a dedicated analytics database.

Async Analytics Pipeline

Read API
responds with 302

async event

Kafka
click_events topic

consumed by

Aggregator
batches counts

GeoIP Enricher
adds country, city

writes to

ClickHouse
analytical store

The redirect path stays fast and lightweight. The analytics pipeline runs at its own pace. Click counts in the dashboard are eventually consistent (which is fine, see the Eventual Consistency Patterns article).

ClickHouse is the typical choice for this workload because it crushes aggregation queries on event data. Snowflake, BigQuery, or Druid also work.

Step 10: Rate Limiting and Abuse Prevention

URL shorteners are heavily abused for phishing, spam, and malware distribution. Without controls, your service becomes a tool for attackers and gets blacklisted by browsers.

Per-User Rate Limits

Cap how many URLs a user can create per minute, hour, day. Authenticated users get higher limits. Anonymous users get strict limits. Suspicious patterns trigger captchas.

Implementation: a Redis-backed token bucket per user ID or IP. Each shorten request consumes a token. When tokens run out, return 429 Too Many Requests.

Malicious URL Detection

Before shortening, scan the long URL against known threat databases:

Google Safe Browsing API: the de facto standard. Free for moderate traffic.
VirusTotal: aggregates reports from many threat scanners.
Internal blocklists: domains that have been reported for abuse on your service.

Reject the request if the URL is on any list. Periodically rescan existing links and disable any that turn malicious after the fact.

CAPTCHA on Suspicious Patterns

If one IP creates 100 URLs in an hour, that is suspicious even within rate limits. Trigger a CAPTCHA before allowing more. Real users pass. Bots usually do not.

Step 11: Custom Aliases and Collisions

Users want to pick custom short codes (e.g., bit.ly/their-product). This adds a few wrinkles.

When a user requests a custom alias, the system checks if it exists. If yes, reject. If no, write it (with a uniqueness constraint at the database level so two simultaneous requests for the same alias cannot both succeed).

def create_with_custom_alias(alias, long_url, user_id):
    try:
        db.insert(
            short_code=alias,
            long_url=long_url,
            user_id=user_id,
        )
    except UniqueConstraintViolation:
        raise ConflictError(f"Alias '{alias}' is taken")

You also need a reserved word list: aliases like admin, api, login, about should never be allowed because they collide with site routes. Block them upfront.

If you allow alias reuse after deletion, set a cool-down period so phishers cannot grab a recently-deleted trusted alias.

Step 12: Edge Cases and Operational Concerns

URL Validation

Reject invalid URLs upfront: malformed URIs, IP-only URLs (often used for malware), URLs pointing back at your own domain (redirect loops), URLs longer than some sane limit (10KB or so).

Expired Links

A daily background job scans for expired links and either deletes them or moves them to an "expired" state that returns a 410 Gone. Do not block the redirect path with expiration logic; check it lazily on access.

Global Latency

Users in Tokyo should not wait for a redirect to come from a US server. Put your read API behind a CDN like Cloudflare or CloudFront with edge caching enabled. The CDN can cache 302 responses with a short TTL (a few minutes), serving most redirects from the user's nearest edge node.

Backups and Disaster Recovery

The dataset is small (3 TB) and immutable in nature (rows are mostly added, rarely changed). Daily snapshots to object storage are sufficient. Keep at least 30 days of point-in-time backups.

Multi-Region Deployment

For very high availability, run the system in multiple regions (US East, EU, Asia). Read replicas in each region for the URL database. Write traffic still goes to a single primary region (writes are rare). Failover plan must be tested.

The Complete Architecture

Putting all the pieces together, here is the final picture:

Full URL Shortener Architecture

Edge

Users

CDN
Cloudflare

Load Balancer

routes

Stateless API

Write API
shorten, custom alias

Read API
redirect

Auth + Rate Limit
token bucket in Redis

checks

Stateful

Redis Cluster
hot URL cache

Token Server
ID ranges

URL DB
sharded by code

async

Pipeline

Kafka
click events

Stream Processor
enrich + aggregate

ClickHouse
analytics

side jobs

Background

Expiration Job
daily

Threat Scanner
Safe Browsing

Backup Job
nightly snapshots

Key Design Decisions Recap

Looking back, the design hinges on a few key choices:

Counter-based encoding with Base62, 7 characters. Collision-free, scales to trillions.
Range allocation for distributed IDs. Avoids the bottleneck of a single counter.
NoSQL with hash sharding by short_code. Right fit for the dominant access pattern.
Aggressive caching with request coalescing. Handles read-heavy load without melting the database.
Async analytics through Kafka. Keeps the redirect path fast.
CDN at the edge. Low latency globally.
Rate limiting and threat scanning. Keeps abusers out.

The One Thing to Remember

A URL shortener looks like a hash table behind an API. At small scale, it is exactly that. At Twitter scale, every layer becomes interesting: the encoding affects predictability, the ID generator affects throughput, the cache affects database load, the queue affects analytics latency, and the rate limiter affects whether your service is usable for anyone but spammers.

The lesson is broader than this one system. Most "simple" services look simple because someone designed them well. Behind every clean API is a stack of careful choices about consistency, partitioning, caching, and failure modes. Knowing how to make those choices is the actual skill of system design.

If you can build a URL shortener that handles 12,000 reads per second, never collides, survives a thundering herd, blocks malicious URLs, and stays fast under viral spikes, you have a working model for a hundred other systems. Almost everything else is the same patterns rearranged.