Why Replicate?

Three reasons:

Availability: if one server dies, another keeps serving. No downtime.
Read scalability: distribute read traffic across multiple copies. Each replica handles a slice.
Geography: keep replicas near users to reduce latency.

Replication is the foundation of every production-grade database setup. Even modest systems run with at least one replica for failover.

The Three Replication Topologies

1. Leader-Follower (Master-Slave)

One server is the leader. It accepts all writes. It streams its changes to one or more followers. Followers serve reads only; they never accept writes.

Leader-Follower Topology
Leader
writes + reads
replicates to
Follower 1
reads only
Follower 2
reads only
Follower 3
reads only

Pros: simple. No write conflicts because there's only one writer.
Cons: the leader is a write bottleneck and a single point of failure. Promotion of a new leader during failover is tricky.

Used by: Postgres, MySQL, Redis, MongoDB. The default and most common topology.

2. Multi-Leader

Multiple servers can accept writes. Each one replicates to the others.

Pros: writes are not bottlenecked on one machine. Local writes possible per region.
Cons: conflict resolution. If two leaders accept conflicting writes simultaneously (user A is changed differently in two places), the system has to merge them somehow. Most choices for conflict resolution are imperfect.

Used by: Cassandra (sort of), CouchDB, multi-region setups in some clouds.

3. Leaderless

No designated leader. Every node accepts writes. Clients write to several nodes simultaneously and read from several to ensure consistency.

Uses quorums: if you have N replicas, with W writes acknowledged and R reads acknowledged, you get strong consistency when W + R > N.

Pros: very fault-tolerant. No leader to fail. Writes succeed even if some nodes are down.
Cons: the client (or coordinator) handles all the complexity of figuring out which value is current. Repair processes (read repair, anti-entropy) needed to keep replicas in sync.

Used by: DynamoDB, Cassandra (with Dynamo-style mode), Riak.

Synchronous vs Asynchronous Replication

Orthogonal to topology: when does the leader confirm a write to the client?

Synchronous: wait until all (or some) followers acknowledge the write before responding. Strong durability. The write is safe even if the leader crashes immediately.
Asynchronous: respond as soon as the leader has written it. Followers catch up later. Fast, but if the leader crashes before replicating, that data is lost.
Semi-synchronous: wait for at least one follower (not all). Balances safety and speed.

Most production databases default to async or semi-sync. Pure sync replication is too slow for most workloads.

Replication Lag

With async replication, followers are always slightly behind the leader. Usually milliseconds, sometimes seconds, occasionally minutes during heavy load.

This causes read-your-write inconsistency: a user writes data, immediately reads, but their read goes to a follower that hasn't caught up yet. They see old data and panic ("did my change get saved?").

Mitigations:

Read from leader after writes: for a short window after a user writes, route their reads to the leader. Sticky session.
Track replication position: remember the leader's log position when the user writes. Make subsequent reads wait until the follower catches up to that position.
Show pending state in UI: tell the user "saving..." until the change is confirmed propagated.

Failover: When the Leader Dies

If the leader crashes, one of the followers needs to take over. This is failover, and it's harder than it sounds.

Detection: how do you know the leader is really down vs just slow? Heartbeats with timeouts.
New leader selection: which follower takes over? Usually the one with the most recent data.
Reconfiguration: tell all clients and remaining followers about the new leader.
Data loss: if the old leader had writes that hadn't been replicated yet, those are lost when the new leader takes over.
Split brain: if the old leader comes back and thinks it's still leader while the new one is also accepting writes, you have two leaders and conflicting data. Use fencing tokens.

Tools that automate failover: Patroni (Postgres), MHA (MySQL), Sentinel (Redis), the database's own clustering features.

Replication Methods Under the Hood

Statement-based: ship the SQL statement to followers, replay it. Brittle: non-deterministic functions (NOW(), RAND()) produce different results.
Write-ahead log (WAL) shipping: ship the binary log of changes. Reliable. Used by Postgres streaming replication.
Logical (row-based): ship row-level changes. More portable across versions. Used by MySQL row-based replication and modern Postgres logical replication.

The One Thing to Remember

Replication isn't optional in production. The question is which topology and how synchronous. Leader-follower with async or semi-sync is the right starting point for almost everyone. Multi-leader and leaderless are powerful but bring conflict-resolution complexity that most teams don't actually need. Whichever you pick, design around replication lag, plan failover before you need it, and test it before production.