Why Indexes Exist

Imagine a 100 million-row table. You ask for the user with email [email protected]. Without an index, the database must read every single row and check each email. That's a full table scan: O(N) work. On 100M rows, this might take 30 seconds or more.

An index is a separate data structure that lets the database find rows by a specific column without scanning the whole table. With an index, that same query takes a few milliseconds.

Think of indexes like the index at the back of a book. Instead of reading every page to find "Atomicity," you flip to the index, find the entry, and jump straight to page 47.

The Trade-off

Indexes make reads faster but writes slower and storage larger. Every insert, update, or delete now has to update the table AND every relevant index. A table with 5 indexes has roughly 5x the write cost.

Rule of thumb: only index columns you actually query and filter on. Indexing every column "just in case" is a common rookie mistake.

The Big Three Index Types

1. B-Tree Index (The Default)

Used by Postgres, MySQL, Oracle, SQL Server. The "B" stands for "balanced." It's a tree structure where every leaf is at the same depth.

How it works: each node holds a sorted range of keys and pointers to child nodes. To find a value, you start at the root and walk down. At each step, you eliminate roughly half (or more) of the remaining tree. This gives O(log N) lookups.

For 100M rows, that's about 27 comparisons instead of 100M. Reads in microseconds.

Strengths: equality lookups (WHERE id = 5), range queries (WHERE created_at > '2026-01-01'), ordering (ORDER BY name), and prefix matches on text (WHERE name LIKE 'A%').
Weaknesses: doesn't help with substring matches (LIKE '%apple%') or full-text search.

2. Hash Index

A hash table stored on disk. Hash the key, jump to that slot, return the row pointer. Lookups are O(1) instead of O(log N).

Strengths: exact equality lookups are slightly faster than B-trees.
Weaknesses: no range queries, no ordering, no prefix matches. Hash distribution erases natural ordering.

Used in Redis (the whole engine is hash-based), Postgres (rarely, as a special index type), and key-value stores.

3. LSM-Tree (Log-Structured Merge Tree)

Used by Cassandra, RocksDB, LevelDB, ScyllaDB. Optimized for write-heavy workloads.

How it works: writes go to an in-memory sorted structure (memtable). When the memtable fills up, it's written to disk as an immutable sorted file (SSTable). Old SSTables are merged together periodically (compaction). Reads check the memtable first, then SSTables newest to oldest.

Strengths: extremely fast writes. Sequential disk writes only. Good for time-series and append-heavy data.
Weaknesses: reads can be slower (multiple files to check). Compaction is background work that consumes CPU and IO.

Other Index Types Worth Knowing

Inverted Index: used for full-text search (Elasticsearch, Solr, Lucene). Maps each word to the documents containing it.
R-Tree / Spatial: for geographic queries ("nearest restaurants"). PostgreSQL's PostGIS uses these.
Bitmap Index: very fast for low-cardinality columns (gender, status). Memory-efficient. Common in data warehouses.
BRIN (Block Range Index): Postgres-specific. Tiny indexes for naturally ordered data (time-series).
GIN/GiST: Postgres index types for arrays, JSON, full-text.

Composite Indexes (Multi-Column)

You can index on multiple columns: CREATE INDEX idx ON users (last_name, first_name). Order matters.

This index helps with queries like:

WHERE last_name = 'Smith' (uses the index).
WHERE last_name = 'Smith' AND first_name = 'John' (uses both columns).

But NOT with: WHERE first_name = 'John' alone. The leading column must be present. This is called the leftmost prefix rule.

Covering Indexes

An index that contains all the columns a query needs. The query never has to touch the actual table; it gets everything from the index. This is called an index-only scan and is the fastest possible query.

-- Query
SELECT first_name, last_name FROM users WHERE last_name = 'Smith';

-- Covering index
CREATE INDEX idx_covering ON users (last_name, first_name);
-- Now the index alone has both columns; no table read needed.

How to Spot a Bad Index

Use EXPLAIN (Postgres, MySQL) to see what the database is actually doing. Look for:

Seq Scan / Full Table Scan: the query is reading every row. Either no index exists or the planner decided not to use it.
Filter rows ratio: if the index returns 1M rows and you keep 10, the index is too broad.
High write latency: too many indexes on a write-heavy table.
Unused indexes: Postgres tracks index usage (pg_stat_user_indexes). Drop indexes nobody uses.

The One Thing to Remember

An index is a deal: spend write time and storage to save read time. Made wisely (one or two on the columns you actually query), indexes are the difference between a database that scales and one that crawls. Made carelessly (an index on every column), they slow writes to a halt without helping reads. Always verify with EXPLAIN before assuming an index is being used.