The Problem
Your transactional database holds the source of truth. But many other systems need to see changes: a search index, a data warehouse, a cache, a downstream microservice, an audit log. The naive way is to either dual-write from your application (write to DB AND publish event) or periodically poll the database for changes.
Both have problems. Dual-writes risk inconsistency (DB succeeds, event fails, or vice versa). Polling is slow and misses deletes.
Change Data Capture (CDC) solves this by reading the database's transaction log directly. Every committed change becomes an event you can stream to anyone.
How CDC Works
Every database has a transaction log (Postgres WAL, MySQL binlog, SQL Server transaction log) that records every modification. The database uses this log for crash recovery and replication. CDC reads it for downstream use.
A CDC tool (like Debezium):
1. Connects to the database as a replica.
2. Reads the transaction log entries.
3. For each entry (insert, update, delete), produces an event.
4. Publishes the event to Kafka or another message bus.
5. Tracks its position in the log so it can resume after restart.
The Event Format
A CDC event typically contains:
{
"op": "u", // c=create, u=update, d=delete
"ts_ms": 1714896000000,
"before": { // previous state (for u, d)
"id": 42,
"name": "Old Name"
},
"after": { // new state (for c, u)
"id": 42,
"name": "New Name"
},
"source": {
"db": "myapp",
"table": "users",
"lsn": 12345 // log position
}
}
Consumers can react to specific operations: only inserts, only deletes, only changes to certain columns.
Why CDC Beats Dual-Write
Dual-write: app writes DB, then publishes event. If the app crashes between, the event is lost. Inconsistency.
CDC: only one write (to DB). The event flows from the log automatically. The transaction log is the source of truth. No race conditions, no missed events.
This is sometimes called outbox replacement: CDC obviates the outbox pattern (which exists specifically to fix dual-write inconsistency).
Architecture
Major Implementations
Debezium: the popular open-source choice. Connectors for Postgres, MySQL, MongoDB, SQL Server, Oracle, Cassandra. Runs as a Kafka Connect plugin. Production-grade.
Maxwell: simpler, MySQL-only. Output to Kafka, Kinesis, Pub/Sub.
AWS DMS: AWS managed CDC service.
Built-in: some databases have native streaming. PostgreSQL logical replication, MongoDB change streams, DynamoDB Streams.
Common Use Cases
Database replication: Postgres to MySQL, or different versions of the same DB.
Search index sync: Postgres to Elasticsearch. New row in DB = new document in index.
Cache invalidation: when a row changes, push an event that invalidates relevant cache entries.
Microservice integration: the order service updates its DB; the analytics service reads CDC events to update its own state.
Audit logs: every change recorded immutably.
Data warehousing: stream operational changes into the warehouse continuously instead of nightly batch dumps.
The Snapshot Problem
CDC starts capturing from "now." But what about the existing data when you first set up CDC?
Two approaches:
Initial snapshot: CDC reads the entire current state once (a full table scan), produces "create" events for every row. Then switches to log streaming. Standard with Debezium.
Skip snapshot: only stream changes from now on. Existing data is missed; can be backfilled separately.
Schema Changes
What happens when you add a column? Drop one? Rename a table?
CDC captures these as events too (DDL events). Downstream consumers need to handle schema evolution. Common pattern: a schema registry alongside the event stream that consumers reference.
The One Thing to Remember
CDC turns your database's transaction log into a real-time event stream that any downstream system can consume. It eliminates the dual-write problem and is the cleanest way to integrate a transactional database with the rest of your data infrastructure. If you've ever found yourself writing application code to "publish an event after writing to the database," there's a good chance CDC is what you actually want.