Eventual Consistency Patterns

Why Distributed Systems Cannot Always Be Consistent

Imagine you have a single database. You write data to it, then you read it, and you always see exactly what you just wrote. Simple. Predictable. This is called strong consistency. ACID guarantees give it to you out of the box, as long as everything fits on one machine.

Now imagine your system is too big for one machine. You have a payments service, an inventory service, a shipping service, a notifications service. Each runs on its own servers, with its own database, in different regions. A single user action (like placing an order) needs to update all of them.

You suddenly hit a wall. Forcing all those services to agree on the same state at the exact same moment is either impossibly slow, fragile, or both. So distributed systems make a different bargain: eventually, all the services will agree. For a brief window, they will not.

That bargain is called eventual consistency, and it is the foundation of nearly every system you use at scale: Amazon's shopping cart, Twitter's timeline, your bank's transaction history, Uber's location tracking. They all live in this world.

The challenge is not whether to accept eventual consistency. It is which pattern to use to manage it. Get the pattern wrong and you get duplicate orders, lost messages, or impossible-to-debug race conditions. Get it right and your system feels seamless to users despite running across hundreds of servers.

This article walks through the four core patterns: Event-based, Background Sync, Saga, and CQRS. By the end, you will know when to reach for each one.

The Consistency Spectrum

Consistency is not a yes-or-no property. It is a spectrum. On one end you have absolute strong consistency: every read sees the latest write, no matter what. On the other end you have eventual consistency: reads might see stale data for a while, but eventually they catch up. In between are many flavors.

The Consistency Spectrum

Strong Consistency

Every read sees the latest write

Slow, expensive, hard to scale

Read-Your-Writes

Monotonic Reads

Causal Consistency

Eventual Consistency

Reads will catch up over time

Fast, scalable, trickier to reason about

Most real systems pick a position on this spectrum based on their needs. A bank balance needs to be strongly consistent. A social media like count is fine being eventually consistent. The trick is picking correctly per use case, not for the whole system.

Pattern 1: Event-Based Eventual Consistency

This is the most common pattern in modern systems. The idea is simple: when one service changes its data, it publishes an event describing what happened. Other services that care about that event subscribe and update their own data accordingly.

The services never call each other directly. They communicate through a message broker (like Kafka, RabbitMQ, or Pulsar). This keeps them loosely coupled. Service A does not know or care which other services are listening. It just shouts "OrderCreated" into the void, and the void figures out who needs to hear it.

Event-Based Flow: User Places an Order

Orders Service

Saves order. Publishes event.

emits

Event Bus

OrderCreated

Kafka / RabbitMQ / Pulsar

fans out to

Inventory

Decrements stock

Billing

Charges payment

Shipping

Schedules dispatch

Notifications

Sends email

Real-world example: An e-commerce platform. When the Orders service writes a new order, it publishes OrderCreated. Five different downstream services consume that event independently. Inventory drops the stock count. Billing kicks off the charge. Shipping creates a label. Notifications emails the customer. Analytics tracks the conversion. None of these services know about each other.

Pros

Loose coupling. Add new consumers without touching the producer.
High throughput. The event bus absorbs spikes.
Audit trail. Every change is an immutable event.
Resilient. If one consumer is down, others keep working.

Cons

Hard to debug across services.
Out-of-order events can cause weird states.
You must handle duplicate events (idempotency).
No straightforward rollback if something fails downstream.

When to use it: Default choice for microservices. Use it whenever a state change in one service should trigger reactions in others, and you do not need to roll back the whole chain on failure.

Pattern 2: Background Sync

This pattern does not react to events at all. Instead, a scheduled job runs periodically and copies data from one place to another. It is the boring, reliable workhorse of eventual consistency.

You set up a cron job (or an Airflow DAG, or a Lambda on a timer) that runs every 5 minutes, every hour, every night. The job reads from the source, transforms what it needs, and writes to the target. Differences between the two systems get reconciled with each run.

Background Sync: Scheduled Reconciliation

Every N minutes

triggers

Sync Job

Source DB
system of record

read

Sync Job

write

Target DB
derived view

No real-time hooks. The target is always slightly behind the source. The lag equals the schedule interval.

Real-world example: A data warehouse. Operational databases serve the live application. Every night at 3 AM, an ETL job copies fresh data into the warehouse so analysts can query it without slowing down production. Reports lag the live data by a day, but everyone is okay with that.

Another example: search indexes. A nightly job rebuilds the Elasticsearch index from the canonical Postgres data. New articles take up to 24 hours to appear in search, which is fine for many use cases.

Pros

Dead simple to implement and reason about.
No event infrastructure needed.
Easy to re-run if something fails (just trigger again).
Easy to add new sync jobs without touching producers.

Cons

Lag equals the schedule interval. Always stale by some amount.
Big jobs that scan entire tables get slow at scale.
Hard to handle deletes (the source no longer has the row).
Wasted compute when there's nothing new to sync.

When to use it: When the lag is acceptable (minutes to hours), the data volume is manageable, and you want simplicity over real-time freshness. Common for analytics, reporting, search indexing, and cross-region sync.

Pattern 3: Saga

The hardest case is when a single business transaction spans multiple services and you need all of them to succeed (or all of them to be rolled back). You cannot use a database transaction because the data lives in different databases. You cannot just hope events work out, because if step 3 fails after step 2 already happened, you are left with corrupt state.

A saga solves this. It is a sequence of local transactions. Each step writes to one service's database. If any step fails, the saga runs compensating transactions in reverse order to undo the previous steps.

There are two flavors of saga: choreography and orchestration.

Saga Flavor 1: Choreography

No central coordinator. Each service listens for events and decides what to do next. The flow emerges from the conversation between services.

Choreography: Services Pass the Baton

Orders

Creates order, emits OrderCreated

listens

Payment

Charges card, emits PaymentSucceeded

listens

Inventory

Reserves stock, emits StockReserved

listens

Shipping

Schedules dispatch, emits OrderShipped

Pros: No single point of failure. Easy to add new steps. Fits nicely with event-driven systems.

Cons: The overall flow is hidden. You have to read multiple services' code to understand what happens end-to-end. Cycles and ambiguity become real risks as the saga grows.

Saga Flavor 2: Orchestration

A central orchestrator coordinates the saga. It calls each service in sequence, knows the expected outcome, and triggers compensating transactions if any step fails.

Orchestration: One Brain Coordinates Everyone

Saga Orchestrator

1. Reserve flight

Flight Service

2. Reserve hotel

Hotel Service

3. Reserve car

Car Service

4. Charge customer

Payment Service

If any step fails, the orchestrator runs compensations in reverse: refund payment, cancel car, cancel hotel, cancel flight.

Real-world example: Booking a vacation package. The orchestrator reserves a flight, then a hotel, then a car. If the car reservation fails, it cancels the hotel, then the flight. The user either gets the full package or nothing.

Pros: Clear, centralized logic. Easy to understand and monitor. Compensations are explicit.

Cons: The orchestrator becomes a single point of complexity (though tools like Temporal, AWS Step Functions, and Camunda make this manageable). Some coupling re-introduced.

Compensating Transactions

The most important concept in sagas. Since you cannot use database rollback across services, you have to undo previously committed work by writing compensating actions.

Charge $100

compensated by

Refund $100

Reserve hotel room

compensated by

Cancel reservation

Send welcome email

compensated by

Send "ignore previous email" follow-up

Note that some compensations are tricky or impossible. You can refund a charge, but you cannot un-send an email. Real saga design requires thinking about this carefully.

When to use sagas: When you need transactional semantics across services. Booking systems, payment flows, multi-step workflows, anything where partial completion is unacceptable.

Pattern 4: CQRS (Command Query Responsibility Segregation)

The previous patterns were about consistency between services. CQRS is about something different: separating writes from reads within a single domain.

The key insight: most applications have very different needs for writes versus reads. Writes need integrity, validation, business rules. Reads need speed, flexibility, denormalization. Trying to use the same data model for both forces compromises.

CQRS splits them. The write model handles commands (create order, update profile, cancel subscription). The read model handles queries (show me my orders, show top products, show user profile). They live in separate databases, optimized differently, kept in sync via events.

CQRS Architecture

User

Write Side

commands

Command Handler
validates, applies rules

writes to

Write DB
normalized, ACID

Read Side

queries

Query Handler
fast lookups

reads from

Read DB
denormalized, optimized

Eventual sync via events

Write side publishes events. Read side updates its projections.

Real-world example: An e-commerce platform's product page. Writes (admin updating prices, inventory changes) go to a normalized Postgres database with full ACID. Reads (millions of users browsing) go to a denormalized read model in Elasticsearch or Redis with all the data needed for the page already pre-joined and pre-formatted. New writes propagate to the read model within a second or two via events.

Another example: a banking dashboard. The transaction ledger is an append-only write model with strict ACID. The user-facing account balance is a separately maintained read model that pre-computes balances from the ledger. The user always sees a snappy summary even though the underlying ledger is huge.

Pros

Reads and writes scale independently.
Different data models for different needs (normalized writes, denormalized reads).
Different storage engines per side (Postgres for writes, Elasticsearch for reads).
Read side can power many different views without affecting writes.

Cons

Complexity: two data models, two databases, sync layer between them.
Read-after-write surprises (just wrote something, immediately query, do not see it yet).
Eventual consistency lag must be visible to users or designed around.
Overkill for simple CRUD apps.

When to use CQRS: When read and write workloads have very different characteristics. Heavy read traffic on data that changes slowly. Complex domain with strict write rules but simple lookup needs. High-throughput systems where mixing reads and writes on the same database hurts both.

Common Failure Modes

Eventual consistency introduces a class of bugs that simply do not exist in single-database, strongly consistent systems. Knowing these in advance saves a lot of pain.

Out-of-Order Events

Event B arrives before Event A even though A was published first. The consumer processes them in the wrong order and ends up in an inconsistent state. Solution: order events with sequence numbers or vector clocks. Or design events to be order-independent (idempotent).

Duplicate Processing

The same event arrives twice (network retries, replay). The consumer processes it twice. Solution: idempotency. Each event has a unique ID. Consumers track which IDs they have processed and skip duplicates.

Lost Events

Event published but never consumed because the consumer was down or had a bug. Solution: durable event storage (Kafka retention), replay capability, monitoring for consumer lag.

Compensation Failures

A saga's compensating transaction itself fails. Now you cannot complete the action AND cannot undo it. Solution: compensations must be retryable. Some require human intervention as a last resort.

Stale Reads

User writes data, immediately queries, sees the old version. Confusing and feels broken. Solution: read-your-writes consistency (route reads to the same node), or surface "pending" state in UI.

Schema Drift

Producer publishes events in a new format that consumers do not understand yet. Solution: schema registry, versioned events, backwards-compatible changes.

Which Pattern Should You Use?

Picking a Pattern

Does your update need to span multiple services?

Are reads and writes very different in shape or volume?

Yes

CQRS

Split write and read models within the service.

Stay strongly consistent

Use a regular ACID transaction. No need for eventual consistency at all.

Yes

Do all steps need to succeed together (all-or-nothing)?

Yes

Saga

Choreography for simple flows, orchestration for complex ones with rollback.

Do you need fresh updates within seconds?

Yes

Event-Based

Producer publishes events. Consumers update their state.

Background Sync

Scheduled job copies data. Simplest. Lag matches schedule.

The One Thing to Remember

Eventual consistency is not a compromise. It is a tool. Picking it gives you scale, availability, and looser coupling that strong consistency cannot match. The price you pay is reasoning about time: knowing that for some window, your different services will disagree.

The four patterns covered here are not mutually exclusive. Real systems use multiple at once. Events flow between services for cross-service consistency. CQRS optimizes a single service's read and write needs. Sagas wrap critical multi-step flows. Background sync fills in when nothing else needs to be real-time.

The goal is not to avoid eventual consistency. It is to choose where to place it deliberately, with the right pattern, and design around the fact that the system will sometimes be temporarily inconsistent. The teams that get this right build systems that feel seamless to users despite running across hundreds of servers in different parts of the world. The teams that get it wrong get woken up at 3 AM trying to figure out why an order has been refunded twice.