The Problem That Forces Sharding

A single database server can handle a few terabytes of data and tens of thousands of queries per second on great hardware. That sounds like a lot. At small scale it is.

But web-scale companies have datasets in the petabytes and write throughput in the millions of operations per second. Twitter, Facebook, Stripe, Uber. Their data does not fit on one machine. There is no machine large enough.

You have two options when one machine is not enough.

Vertical scaling: bigger machine. More RAM, more CPU, faster disk. Costs grow exponentially as machines get larger. There is a hard ceiling: even the largest commercial servers max out somewhere. Easy to do, until it is not. Eventually you cannot buy a bigger machine.

Horizontal scaling (sharding): split the data across many smaller machines. Costs grow linearly. There is no ceiling: you can always add more machines. But it is hard to do well. The complexity is significant.

Sharding is horizontal scaling for databases. Each "shard" is a regular database instance holding part of the data. Together, the shards form one logical dataset. The application sees a unified database; under the hood, it is many.

This article walks through how sharding actually works, when to do it, and how to avoid the worst pitfalls.

Step 1: Partitioning vs Sharding

These terms get used interchangeably but there is a subtle difference worth knowing.

Partitioning: splits a table into pieces, often within a single database instance (Postgres table partitions, MySQL partitions). The data still lives on one machine. Improves query performance because the database can skip irrelevant partitions. Does not solve capacity or write-throughput problems.

Sharding: splits data across multiple separate database instances on different machines. Solves capacity and write throughput. Adds substantial operational complexity.

Most large systems do both: partition tables within shards, and shard those tables across instances. The terms get conflated because at scale you typically use the same key for both.

Step 2: Sharding Strategies

Strategy 1: Range-Based Sharding

Divide the keyspace into contiguous ranges. Each shard owns one range.

Range Sharding by User ID
Shard
Range
Data
Shard 1
user_id 0 to 1M
Oldest users
Shard 2
user_id 1M to 2M
Mid-cohort
Shard 3
user_id 2M to 3M
Newer users
Shard 4
user_id 3M+
Newest

Pros: simple to understand. Range queries (e.g., "all users created last month") are fast because data is contiguous.
Cons: hot shards are common. New users are usually more active. Shard 4 gets all the new traffic while Shard 1 is largely idle. The same problem applies to time-based ranges (the latest data is always the hottest).

Used by: HBase, MongoDB (with range-based sharding), some custom systems. Generally avoid for user-generated data unless your access pattern doesn't favor recent IDs.

Strategy 2: Hash-Based Sharding

Hash the shard key and use modulo (or consistent hashing) to pick a shard.

shard_index = hash(user_id) % number_of_shards

Pros: uniform distribution. Hot user IDs spread across shards naturally. No hot shards from sequential IDs.
Cons: range queries become expensive (must hit every shard). Adding a new shard with simple modulo is painful: almost every key needs to move to a different shard. Use consistent hashing instead, which moves only 1/N of the data when adding a node.

This is the most common sharding strategy in practice. Used by Cassandra, DynamoDB, most key-value stores, and many sharded SQL setups.

The Consistent Hashing article goes into detail on how consistent hashing minimizes data movement during topology changes.

Strategy 3: Geographic Sharding

Shard by location. EU users on EU shard, US users on US shard, Asia users on Asia shard.

Pros: low latency (data is close to users). Compliance (EU data stays in EU per GDPR). Local writes possible without cross-region round trips.
Cons: uneven load (most users might be in one region; that region's shard becomes overloaded). Cross-region queries are expensive. Replication and failover within regions adds complexity.

Used by: Spotify, Uber, banking systems, anything with strong regional compliance needs. Particularly common when latency or regulation forces it.

Strategy 4: Directory-Based Sharding

A separate lookup service maps each key to a specific shard. The lookup table can be updated to rebalance without changing keys.

Pros: maximum flexibility. Can move data between shards without changing keys. Can store small amounts on small shards and large amounts on big ones (asymmetric).
Cons: the directory itself becomes a bottleneck and a single point of failure. Adds an extra hop on every query. Requires the directory to be available, fast, and reliable.

Used in some specialized systems. Generally less popular than hash-based because the lookup overhead is significant.

Step 3: Choosing the Shard Key

The shard key is the column or columns you partition on. This is the most important decision in your sharding design. A bad shard key cripples the entire system.

Properties of a Good Shard Key

High cardinality. Many distinct values, so data spreads evenly across shards. A boolean field has only two values; sharding on it produces two shards regardless of how many you set up.

Even access distribution. No single value gets disproportionate traffic. Sharding on user_id works well because traffic is spread across users (most of the time).

Aligns with dominant query patterns. The shard key should appear in your most common WHERE clauses. If 99% of queries are "give me data for user X" and you shard by user_id, almost every query goes to one shard. If you shard by date, the same query has to fan out to all shards.

Stable. The shard key should not change over time, since changing it means moving the row from one shard to another. Email addresses can change. user_id usually doesn't.

Properties of a Bad Shard Key

Boolean fields. Two values means two shards regardless. Useless for distributing load.
Status fields where 99% of rows are "active." 99% of the load lands on one shard.
Auto-incrementing IDs combined with range sharding. Always hits the latest shard.
Customer ID for B2B SaaS where one customer represents 90% of traffic. Tenant isolation on a shard with 90% of the load is barely sharded.
Timestamp alone. Most workloads are skewed toward recent data; the latest shard becomes hot.

Composite Shard Keys

Sometimes a single column doesn't work but a combination does. Example: shard by (tenant_id, user_id). Each tenant's data is spread across shards (good), and within a tenant, individual users still distribute well.

Used by some multi-tenant SaaS systems where individual tenants are large.

The Pivot Decision

If you find yourself with a bad shard key after launch, fixing it requires migrating all data to a new sharding scheme. This is one of the most painful operations in distributed databases. Spend significant time on the shard key decision before sharding. Get it wrong and you might be re-sharding for 18 months.

Step 4: The Hotspot Problem

Even with a good strategy, hotspots happen. A celebrity user has 100M followers; their tweets get queried 1000x more than average. Their shard melts under load.

Or a viral product gets featured; its shard handles 100x normal traffic. Or a single key (cache key, lock key) gets accessed by every request.

Mitigations

Replication. Read replicas of hot shards distribute read load. The hot shard's writes still concentrate, but reads spread.

Caching. Hot user data in Redis takes pressure off the shard. The cache absorbs most reads; only writes and cache misses hit the database.

Sub-sharding. If one shard is consistently hot, split it further. The hot shard becomes 4 shards; load distributes 4x. Operationally complex but effective.

Special handling for VIPs. Celebrity accounts get dedicated infrastructure outside the normal shard system. Twitter has historically done this.

Salting. Append a small random suffix to the shard key for hot rows. The "celebrity" row becomes celebrity_1, celebrity_2, ..., celebrity_10. Reads aggregate across the salted versions. Distributes the load but complicates reads.

Query routing tricks. Some platforms detect hot keys and route them to dedicated infrastructure with more replicas.

Hotspots are usually known patterns (top users, viral content) and operational teams handle them with manual interventions. Fully automatic hotspot mitigation is rare in production.

Step 5: Cross-Shard Queries Are Painful

Once data is sharded, certain queries become hard. Knowing which queries are easy vs hard shapes the design.

Easy: Single-Shard Queries

"Give me everything about user X" where X is the shard key. Direct lookup to one shard. Fast and clean.

Hard: Joins Across Shards

If "users" and "orders" are sharded by different keys (or one is unsharded and the other is sharded), joining them means scatter-gather across all shards. Every shard scans, partial results return, application code merges.

Mitigations: collocate related data on the same shard (orders sharded by user_id alongside users). Or denormalize: include the joined data within each row.

Hard: Transactions Across Shards

Two-phase commit (2PC) coordinates writes across shards atomically but is slow and complex. Most sharded systems avoid distributed transactions entirely. Patterns like sagas (covered in Eventual Consistency Patterns article) replace them.

If two operations must succeed together (transfer money between users), and they live on different shards, you cannot use simple ACID. You need application-level workarounds.

Hard: Global Ordering

"Give me the 100 most recent orders across all users." Each shard has its 100 most recent. Application must merge and sort. Doable but expensive.

Hard: Analytics Across Shards

"Total revenue today" requires hitting every shard, summing locally, summing again at the application. Scatter-gather. Latency multiplied by slowest shard.

Most teams replicate sharded data into a separate analytics database (data warehouse) to make these queries practical.

Design Around the Hard Cases

Design your access patterns first, then pick the shard key that makes the dominant queries single-shard. Cross-shard queries should be the exception, not the rule. If 95% of queries are single-shard, you have a great shard key. If 50% require scatter-gather, your shard key is wrong.

Step 6: Resharding — The Hardest Part

Eventually you outgrow your initial shard count. Going from 4 shards to 8 means moving roughly half your data. While the system stays online. Without dropping queries. This is one of the most painful operations in any sharded system.

Strategies

Pre-Sharding

Create many "logical shards" (say 1024) and map them onto fewer physical machines. To scale up, redistribute logical shards to new machines without rehashing.

Example: start with 4 physical machines, each holding 256 logical shards. To double capacity, add 4 more physical machines and move 128 logical shards from each old machine to a new one. The keys never re-hash; just the mapping changes.

Used by: many production systems including Vitess, Slack's database, and others. Simplifies resharding dramatically. Pre-shard generously upfront because adding new logical shards later requires rehashing.

Consistent Hashing

Minimizes data movement when adding nodes. Adding a new shard moves only 1/N of the keys (those that fall into the new shard's range on the hash ring).

Used by: Cassandra, DynamoDB, Memcached. The Consistent Hashing article explains the algorithm in detail.

Live Migration

Dual-write to old and new shards during the transition. Read from the new shard but verify against the old. Once verified, cut over reads. Once stable, drop the old shard.

Complex but maintains uptime. Used in many large-scale migrations including major Twitter and Facebook re-architectures.

Why Resharding Hurts

Petabytes of data don't move quickly. Even at 100 MB/s sustained, 1 PB takes 4 months. Realistic migrations involve weeks or months of careful work. Schema changes during migration are nearly impossible. New features queue behind the migration.

The bigger you get without resharding, the more painful the eventual reshard. Many teams underestimate this. They go from 4 shards to "we need more capacity" to "we need to reshard" and then spend most of a year on the migration.

Plan your initial shard count generously. Doubling shards from 32 to 64 is much faster than going 4 to 8.

Step 7: Architectural Patterns

Application-Level Sharding

Your application code knows about shards. It computes the shard for each query and routes to the right database. Example: a sharding library in your ORM.

Pros: full control, simple infrastructure.
Cons: every application must know the sharding logic. Schema changes coordinate across application versions. Tooling (admin queries, debugging) requires shard awareness.

Database Proxy

A layer between application and database that handles sharding. The application thinks it's talking to one database. The proxy routes to the right shard.

Examples: Vitess (for MySQL/MariaDB), Citus extension for Postgres (looks like one DB, transparently shards).

Pros: applications don't change. Easier to migrate to.
Cons: adds a hop and infrastructure to maintain. Proxy bugs are hard to diagnose.

Native Sharded Databases

Databases designed for sharding from the start: Cassandra, DynamoDB, MongoDB sharded clusters, CockroachDB, Spanner.

Pros: sharding built in. Automatic rebalancing in some cases.
Cons: locked into that specific database's design. Migration costs.

NewSQL

A category of databases trying to give SQL semantics on top of distributed storage: CockroachDB, Spanner, YugabyteDB. They handle sharding transparently and offer cross-shard transactions.

Pros: get SQL plus distribution. Don't manage shards manually.
Cons: trade-offs in latency or consistency. Smaller community than Postgres/MySQL.

Step 8: When to Shard (And When Not To)

Don't Shard When

Your dataset fits comfortably on one machine (this is true for more workloads than people assume).
Your write throughput is below ~10k QPS (Postgres handles this trivially).
You haven't tried vertical scaling and read replicas yet.
You haven't tried caching to reduce database load.
You don't have months of engineering bandwidth to invest.

Many teams jump to sharding too early. The complexity introduced is enormous; the benefit is theoretical until you actually hit the limit.

Shard When

Your dataset is too big to fit on one machine, and growing.
Your write throughput exceeds what one machine can handle.
You've already exhausted vertical scaling and read replicas.
You've optimized queries and added caching.
You have engineering capacity to handle the operational complexity.

Sharding Is a One-Way Door

Once you commit, you cannot easily go back. Schema changes, ad-hoc queries, transactions, joins all become harder forever. Plan accordingly.

Alternatives to Consider First

Bigger machine.
Read replicas.
Aggressive caching.
Materialized views or aggregation tables.
Move analytics off the OLTP database.
Consolidate redundant data.
Move large blobs to object storage.
Archive old data to cheaper storage.

Each of these can buy you months or years before you need to shard.

Step 9: Operational Concerns

Backups

Each shard needs its own backup. Coordinated point-in-time recovery across shards is complex; a transaction that touched multiple shards might be partially applied if you restore from inconsistent backup points.

Monitoring

Metrics per shard: query rate, error rate, replication lag, disk usage. One unhealthy shard can block entire workloads. Per-shard dashboards are essential.

Schema Migrations

Run the migration on every shard. Coordinate so they all complete (or all roll back). Tools that handle this: gh-ost on MySQL, pg_repack on Postgres, custom scripts that iterate shards.

The bigger pain: schemas drifting between shards. If the migration fails on one shard, you have a heterogeneous schema until you fix it. Application code must handle both versions during the transition.

Failures of Individual Shards

One shard goes down. What happens? Most apps degrade: requests for users on that shard fail; other users continue normally. The application must handle "this user's data is unavailable" gracefully (cached responses, error UI, retry queues).

Shard Rebalancing

If shards become uneven (one has 10x more data than others), rebalancing moves data. Different databases handle this differently. Some auto-rebalance, some require manual operations.

Cross-Shard Operations Are Slow

Even simple-looking operations might fan out across shards. Monitor for slow queries that touch many shards. Often a small redesign (moving the shard key, denormalizing) fixes them.

Connection Pool Limits

If your application connects to all shards, you have N times the connection load. Pooling matters. Some setups use proxies that maintain pools to all shards on the application's behalf.

Tooling

Many database tools assume a single database: schema diff tools, backup tools, query analyzers. With sharding, tooling needs shard awareness. Build or buy.

Documentation Drift

Operations docs that worked for "the database" now have to specify "all shards" or "the X shard." This drifts over time. Worth investing in shard-aware runbooks early.

Step 10: Recap of Key Decisions

Sharding is the price for going beyond single-machine scale. Painful but unavoidable at scale.
Hash-based sharding with consistent hashing is the most common. Even distribution; minimal data movement on resize.
The shard key decision is critical. Aligns with dominant queries; high cardinality; even distribution; stable.
Hotspots will happen. Plan mitigations: replication, caching, sub-sharding, special handling for VIPs.
Cross-shard queries are slow. Design 95% of queries to be single-shard.
Resharding is hard. Pre-shard to many logical shards; over-provision shard count.
Don't shard prematurely. Bigger machine, replicas, caching first.
NewSQL or sharded SQL proxies hide complexity. Worth considering over manual sharding.

The One Thing to Remember

Sharding is the price you pay to scale beyond a single machine. The hard part isn't the splitting itself; it is picking a shard key that makes 95% of your queries single-shard, and having a plan for the 5% that aren't. Get the key right and sharding feels invisible. Get it wrong and you'll be debugging hot shards, scatter-gather slowness, and resharding pain forever. Don't shard until you must. When you must, design for it carefully, plan resharding before you need it, and test failure scenarios before they happen in production. Sharding is a one-way door; the version of your system before sharding and after are fundamentally different beasts. Go through the door deliberately.