The Question People Ask Wrong

Every team building a new product asks "should we use SQL or NoSQL?" Answers come back as if it were a binary choice. It is not. SQL and NoSQL are not even comparable categories. SQL is a query language and a data model. NoSQL is a marketing umbrella covering at least four very different families of databases that have little in common with each other.

The real question is: what data am I storing, what access patterns do I have, and which database fits them best? Asked that way, the answer is almost never "all SQL" or "all NoSQL." It is "SQL for these things, NoSQL family X for these other things, NoSQL family Y for the third set." This is called polyglot persistence and it is what serious production systems actually look like.

This article walks through the families, the trade-offs, the decision framework, and the myths that distort the discussion.

Step 1: What SQL Actually Is

SQL databases (Postgres, MySQL, Oracle, SQL Server, SQLite, MariaDB) are relational: data lives in tables with fixed schemas, related to each other through foreign keys. The query language is SQL.

Key Properties

Schema-on-write. The structure is enforced when you insert data. A column declared as INTEGER NOT NULL rejects strings or NULLs at write time. The schema is the contract.

ACID transactions. Multiple writes can be grouped into a single transaction that either fully succeeds or fully fails. No partial updates. No race conditions across rows.

Joins. Efficiently query across related tables. The relational model is built around composing data from multiple tables at query time.

Strong consistency. Reads always see the latest committed write. No need to reason about replication lag for the typical single-node use case.

Standard query language. SQL is portable across vendors. Knowledge transfers between Postgres, MySQL, Oracle. Tools (BI, ETL, ORMs) work with any SQL backend.

Where SQL Came From

The relational model was Ed Codd's 1970 paper. The 1980s and 1990s saw it dominate. By 2000, SQL was the default for any application database. It still is for most.

Then around 2008-2010, web-scale companies (Google, Amazon, Facebook) started building things that didn't fit SQL well: massive social graphs, sessions for billions of users, document-style content. They built non-relational databases for those needs. The "NoSQL" label was coined to describe this family of new tools.

Step 2: The Four NoSQL Families

"NoSQL" is an umbrella for four distinct database types. Each was built for a different problem.

Family 1: Document Stores

Examples: MongoDB, CouchDB, Firestore, Amazon DocumentDB.

Data model: JSON (or BSON) documents. Each document is a tree of nested fields. Different documents in the same collection can have different fields.

Best for: hierarchical data (a blog post with embedded comments, a product with variants and metadata, a customer with addresses), schema-flexible workloads, content management systems, anywhere the natural shape of data is a tree, not a table.

How queries work: by document ID (fast), by indexed field (fast), by aggregation pipeline (more complex). Cross-document joins are limited or expensive.

The killer feature: schema flexibility. Add a new field to one document without migrating others. Useful when the schema evolves continuously, less useful when consistency across records matters.

Family 2: Key-Value Stores

Examples: Redis, DynamoDB, RocksDB, Memcached, Riak.

Data model: a giant hash table. You ask for a key, you get a value. The value is whatever you put in (string, blob, JSON, set, sorted set in Redis's case).

Best for: caching, session storage, simple lookups, rate limiters, leaderboards, anything where you have a known key and you want extremely fast retrieval.

How queries work: by key only. Some key-value stores support secondary indexes, but most queries are direct key lookup.

The strength: speed. Redis can do hundreds of thousands of operations per second per node. Sub-millisecond latency. The trade-off: very limited query flexibility.

Family 3: Wide-Column Stores

Examples: Cassandra, HBase, ScyllaDB, Bigtable.

Data model: rows, but each row can have different columns. The actual storage is sparse: missing columns take no space. Optimized for writing huge volumes and querying by row key plus column range.

Best for: time-series data (sensor readings, clickstreams), event logs, IoT, anything append-heavy that gets queried by ID + time range.

How queries work: by partition key (the primary access pattern), then range queries within a partition. Cross-partition queries are expensive.

The killer feature: write throughput. Cassandra can handle hundreds of thousands of writes per second per node. Linearly scalable. Made for the workload of "many tiny writes constantly."

Family 4: Graph Databases

Examples: Neo4j, Amazon Neptune, ArangoDB, JanusGraph.

Data model: nodes connected by edges. Both have properties. The graph is the structure.

Best for: relationship-heavy data: social networks (friends-of-friends), recommendation engines (users who bought X also bought Y), fraud detection (transaction graphs), knowledge graphs.

How queries work: traverse the graph. "Find all people connected to person X within 3 hops" is a single graph query. The same in SQL would be 3 self-joins, slow on real-world graphs.

The killer feature: traversal speed. Graph databases store the connections explicitly, so multi-hop queries are O(neighbors per hop) instead of O(table joins). On dense graphs with millions of edges, the difference is enormous.

Step 3: Side-by-Side Comparison

SQLDocumentKey-ValueWide-ColumnGraph
SchemaFixed (schema-on-write)FlexibleNone (just a value)Flexible (sparse columns)Flexible
ACIDFullDocument-levelLimited (single key)LimitedFull (most engines)
JoinsExcellentLimitedNoneNoneExcellent (traversal)
Horizontal ScaleHard (manual sharding)EasyEasyEasy (built for it)Hard
Read PatternAnything (SQL)By ID, by indexBy keyBy partition + rangeTraversal
Write PatternModerateHighVery highExtremely highModerate
Best Query PatternComplex relationalLookup by ID, partial updatesGet by keyTime-series, ID + rangeMulti-hop traversal

Step 4: Myths to Ignore

Myth 1: "SQL Doesn't Scale"

This was true 15 years ago. It is largely false today. Modern Postgres can handle terabytes and tens of thousands of QPS on a single instance. With read replicas, much more.

Tools like Citus, Vitess, AWS Aurora, and Google Cloud Spanner make SQL scale horizontally. Spanner does global distribution with strong consistency. Vitess powers YouTube. Aurora handles serious volumes for AWS customers. The "NoSQL because we're at scale" reasoning is mostly outdated.

What's actually true: NoSQL was designed to scale horizontally from day one, while SQL was designed to scale vertically and only later got bolt-on horizontal scaling. If you're starting fresh and you know you'll have millions of users, NoSQL has a smoother path. If you're at any normal scale, SQL is fine and more flexible.

Myth 2: "NoSQL is Faster"

For its specific access pattern, often yes. Redis fetching a key beats Postgres fetching the same row. Cassandra writing 100k rows/sec beats Postgres.

But the comparison is unfair. Compare equivalent workloads. A Postgres with the right index serving a known key is roughly as fast as Redis. Postgres on a 16-core server can serve 50k+ QPS for typical OLTP. Speed differences come from query patterns, not the SQL/NoSQL split.

NoSQL is not magic. It is fast for specific patterns by trading away other capabilities (joins, ACID, query flexibility).

Myth 3: "NoSQL is Schemaless"

Half-true. NoSQL allows different documents to have different fields. But your application code still expects certain fields to exist. The schema didn't disappear; it moved from the database to your code.

Without a database-enforced schema, schema mistakes (missing fields, type mismatches) become application bugs. Some teams add schema validation back: MongoDB schema validators, application-level type checking.

Schemaless is freedom plus responsibility. Whether that's a win depends on your team's discipline.

Myth 4: "NoSQL Has No Joins"

Most NoSQL databases lack the SQL JOIN operator. But they often have alternatives:

MongoDB has $lookup (sort of like a left join).
DynamoDB has no joins; you denormalize instead.
Cassandra has no joins; same denormalization approach.
Graph databases have native traversal which is more powerful than JOIN.

The deeper truth: in NoSQL, you usually denormalize data so that one query returns everything you need without joins. This is fine for known access patterns and bad for ad-hoc queries.

Myth 5: "ACID is for Old Databases"

Many NoSQL databases now offer ACID transactions: MongoDB (4.0+), DynamoDB Transactions, Cassandra LWT, etc. The hard line between "SQL has ACID, NoSQL doesn't" is gone.

What remains: SQL gives ACID by default; NoSQL gives ACID with extra effort and often performance cost.

Step 5: How to Decide

Use SQL When

Your data has clear relationships and you'll query them with joins. Most business applications fit here.
ACID transactions matter (money, inventory, orders, anything regulated).
Your access patterns are unpredictable and ad-hoc queries are common.
You don't yet know your scale needs (which is most projects).
Your team knows SQL well; the productivity advantage is real.
You need flexible, complex queries (analytics, BI, reporting).

Use a Document Store When

Your data is naturally tree-shaped. A product with variants. A blog post with comments. A configuration with nested settings.
Schemas evolve frequently and you don't want to manage migrations.
You write and read whole objects, not slices of them.
Schema-on-read is acceptable for your team's discipline level.

Use a Key-Value Store When

You need extreme speed for simple GET/SET.
Caching, sessions, leaderboards, rate limiters.
The data has a natural unique key that's also the only access pattern.
You don't need rich querying.

Use a Wide-Column Store When

You write enormous volumes (sensor data, logs, events).
Queries are predictable: "give me data for this user in this time range."
You need to scale to many nodes from the start.
The data model fits "row key + sorted columns" naturally.

Use a Graph Database When

Your queries cross many relationships (friends-of-friends, recommended products, fraud rings).
Joins in SQL would be slow because of the depth of traversal.
Relationships are first-class concepts in your domain.

Step 6: Polyglot Persistence

Most large products use multiple databases. This is the norm, not the exception. A typical e-commerce stack might use:

Postgres for orders, customers, inventory. The transactional core. ACID matters here.
Redis for sessions, shopping cart cache, rate limiters. Speed matters; data is ephemeral.
Elasticsearch for product search. Full-text and faceted search.
Cassandra for clickstream events. Massive write volume, time-series queries.
Neo4j for the recommendation engine. Multi-hop traversal across user-product graph.
S3 for product images and user-uploaded files. Object storage.

This is called polyglot persistence: pick the right database for each problem instead of forcing everything into one. Each database is a tool; use the one that fits.

The Cost of Polyglot

You pay for it:

More operational complexity (each database needs ops attention).
Cross-database consistency is hard (the order in Postgres must match the cart in Redis must match the search index in Elasticsearch).
More expertise required (your team needs to know each one).
More monitoring, more alerts, more failure modes.

The trade-off: each database does its specific job extremely well. The combination scales further than any one would alone.

The Alternative: Multi-Model Databases

Some databases try to be many things at once: Cosmos DB (key-value, document, graph, column), ArangoDB (document + graph), DynamoDB (key-value + document, lightly). These reduce the polyglot burden but compromise on each model's depth.

For most teams, separate specialized databases beat a multi-model jack-of-all-trades. But for small teams that just need "good enough" for several access patterns, multi-model is a reasonable choice.

Step 7: How the Landscape Has Evolved

The 2010s: NoSQL Boom

Web-scale companies built non-relational databases. MongoDB, Cassandra, Redis, DynamoDB. Marketing positioned them as "the future of databases." Many startups built on NoSQL by default.

The 2010s: NoSQL Trade-offs Became Visible

Teams hit limits: ad-hoc queries are hard without SQL; joins are missed; consistency is harder than expected; schema flexibility creates schema chaos. Some companies migrated back to SQL or to hybrid setups.

The 2020s: SQL Got Distributed

NewSQL (Spanner, CockroachDB, YugabyteDB) brought distributed transactions to the SQL model. Vitess and Citus made horizontal scaling more accessible. Postgres added JSON support, pgvector for embeddings, and many features that previously required a separate NoSQL system.

The result: SQL is more capable than ever. The "scaling problem" that justified NoSQL choices is mostly gone.

The Current Best Practice

Default to Postgres. Add specialized databases only where Postgres genuinely doesn't fit (extreme write volume, full-text search, graph traversal, caching, very specific access patterns).

Many serious production systems are now mostly Postgres plus one or two specialty stores (Redis for caching, Elasticsearch for search). The polyglot stays simpler than the 2015 era of "MongoDB + Cassandra + Redis + RDS + Elasticsearch + Neo4j."

Step 8: Common Decisions and Trade-offs

"We're Building a New Product"

Almost always start with Postgres. Add Redis for caching once you need it. Add other specialized stores only when a specific workload demands them.

The reason: ad-hoc queries during development. SQL gives you the flexibility to ask any question of your data. Once you commit to a NoSQL store, certain queries become hard or impossible without redesign.

"We Need Massive Write Throughput"

Cassandra, ScyllaDB, or DynamoDB if the writes are time-series-shaped. Else, sharded Postgres or a NewSQL system.

Be honest about volume. "Massive" is sometimes 1,000 writes/sec which Postgres handles trivially. Sometimes it's 100,000 writes/sec where you need NoSQL.

"We Have Tons of Unstructured Data"

If "unstructured" really means JSON-like nested data: Postgres JSONB plus indexes handles this beautifully now.
If "unstructured" means files: object storage (S3) plus a metadata database.
If "unstructured" means free-text searchable: Postgres full-text or Elasticsearch.

True unstructured-everything-document-store cases are rarer than they seem.

"We Want Schema Flexibility"

Postgres JSONB columns give you flexible schemas inside a relational database. Many teams discover they didn't need a fully document-oriented database; they just needed one flexible column inside a structured table.

If you genuinely need many different document shapes per collection: MongoDB or similar.

"We Have a Social Graph"

Friends, follows, relationships. Many startups try to model this in SQL with self-joins. It works at small scale; gets painfully slow at large scale.

Options: graph database (Neo4j) for traversal-heavy queries, or sharded SQL with adjacency lists for simpler patterns. Big tech companies often build custom solutions on top of Cassandra or HBase.

"We Need Real-Time Analytics"

Neither traditional SQL nor most NoSQL is great for this. ClickHouse, Druid, Pinot are columnar analytical databases optimized for the workload. Or use a data warehouse (Snowflake, BigQuery) if "real-time" tolerates seconds-to-minutes lag.

Step 9: Recap of Key Insights

SQL vs NoSQL is not binary. Real systems use multiple databases for different jobs.
Start with Postgres for new products. Defer NoSQL decisions until you know specific access patterns.
NoSQL families are very different. Document, key-value, wide-column, graph each solve different problems.
Modern SQL scales further than people think. Citus, Vitess, Spanner, Aurora.
Modern NoSQL has more ACID than people think. MongoDB, DynamoDB transactions, etc.
Schema flexibility moves the schema from DB to app. The schema doesn't disappear.
Polyglot persistence is normal. Pick the right tool per problem, but watch the operational cost.
Most "we need NoSQL because scale" is mistaken. Postgres handles more than people assume.

The One Thing to Remember

SQL vs NoSQL isn't a war you have to pick a side in. SQL is a fantastic default for transactional, relational data with unpredictable query patterns. NoSQL gives you specialized tools for specific problems where SQL would struggle: extreme writes, graph traversal, simple key-value at speed, hierarchical document storage. The skill is recognizing which tool fits each problem and being willing to use more than one in the same system. The right answer is usually "Postgres for the core, plus a few specialized stores for specific access patterns." Anyone who gives you a different answer without first asking about your data and queries is selling something. Start simple, add complexity only when proven necessary, and remember that the database you choose today is one of the hardest things to change later.