All ArticlesSchemaVis logo

Understanding B-Tree Indexes vs Hash Indexes in Databases

A
Akshat Kotpalliwar
4 min read

When you create an index in a SQL database using CREATE INDEX, the database engine typically defaults to a B-Tree (Balanced Tree) structure. However, many relational databases (like PostgreSQL and MySQL's Memory engine) also support Hash indexes.

Choosing the right index type requires understanding the mathematical structures behind them and the types of queries they excel at serving.

The B-Tree Index

A B-Tree (specifically, a B+ Tree in most database implementations) is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time (O(log n)).

How it works: The data is stored in hierarchical nodes. The root node points to branch nodes, which eventually point to leaf nodes. In a B+ Tree, the actual data pointers (locators to the table rows) are stored only in the leaf nodes, and these leaf nodes are linked together in a doubly-linked list.

Strengths: Because the B-Tree stores data in a sorted order, it is incredibly versatile. It is the optimal choice for:

  • Exact matches: WHERE age = 30
  • Range queries: WHERE age BETWEEN 20 AND 40
  • Inequalities: WHERE age > 25
  • Sorting: ORDER BY age DESC
  • Prefix matching: WHERE name LIKE 'Sm%'

The Hash Index

A Hash index uses a hash function to map a specific column value to a "bucket" that contains a pointer to the corresponding row in the table.

How it works: When you query a hashed column, the database applies the same hash function to your search term, calculates the bucket address, and goes directly to that bucket. This provides O(1) (constant time) lookup speed, regardless of how large the table grows.

Strengths: Hash indexes are extremely fast for one specific operation:

  • Exact equality matches: WHERE session_id = 'abc-123'

Limitations: Because a hash function scrambles the input to distribute data evenly across buckets, Hash indexes completely lose the concept of sorting or ordering. Therefore, they are useless for:

  • Range queries (BETWEEN, >, <)
  • Sorting (ORDER BY)
  • Prefix matching (LIKE 'term%')

Furthermore, Hash indexes historically suffered from slower rebuild times and didn't support multi-column (composite) indexing well, though modern implementations like PostgreSQL 10+ have improved this significantly.

Which Should You Choose?

In 95% of scenarios, the default B-Tree index is the correct choice. Its versatility covers the vast majority of application querying patterns, and O(log n) lookup time is exceptionally fast even for billions of rows.

You should explicitly choose a Hash index only when you have a specific column that is only ever queried for exact equality (e.g., looking up a user session token, a unique UUID, or an exact key-value pair) and you need to squeeze out absolute maximum read performance for that specific query.

Related Topics

B-TreeHash IndexDatabase ArchitectureSQLPostgreSQLPerformance Tuning

Visualize your schemas effortlessly

Stop writing manual DDL or guessing relationships. Drop your SQL file into SchemaVis and instantly generate an interactive diagram.

Try SchemaVis for Free