Indexing lets a database find rows without scanning every row. Without an index, finding a row in 1 million records requires up to 1 million comparisons. With a B+ tree index, the same lookup takes ~20 comparisons. B+ trees are the dominant index structure in PostgreSQL, MySQL InnoDB, and Oracle. Hash indexes support only equality lookups but are O(1). GATE tests B+ tree height, fan-out calculations, and index type selection.
Real-life analogy: The textbook index
Finding 'normalization' in a 900-page textbook: (1) No index = read every page = 900 reads. (2) With index: look up N, find normalization → page 423. Two steps. A database index works identically: small, sorted structure mapping search key values to disk block addresses.
B+ Tree structure and height
B+ tree height. N = leaf entries. m = order. For m=100, 1 million records: height = ceil(log₅₀(1,000,000)) = ceil(3.6) = 4. Only 4 disk reads to find any record!
B+ tree height and index type analysis
import math
def bplus_height(n_records: int, order: int) -> int:
"""Max height of B+ tree for n records with given order."""
min_fanout = math.ceil(order / 2)
if n_records <= order - 1:
return 1
return math.ceil(math.log(n_records, min_fanout))
# B+ tree examples
print(f"1M records, order 100: height = {bplus_height(1_000_000, 100)}") # 4
print(f"1B records, order 100: height = {bplus_height(1_000_000_000, 100)}") # 5
print(f"10K records, order 10: height = {bplus_height(10_000, 10)}") # 5
# B+ tree vs sequential scan comparison
n = 1_000_000
order = 100
height = bplus_height(n, order)
print(f"
For {n:,} records with B+ tree order {order}:")
print(f" Sequential scan: up to {n:,} block reads")
print(f" B+ tree lookup: {height} disk reads (height = {height})")
print(f" Speedup: {n//height:,}x")Index types
| Index type | Description | Best for |
|---|---|---|
| Primary index | On ordered key field — data physically sorted by this | Range queries on PK |
| Clustering index | On non-key field, records physically ordered by that field | Range queries on non-PK field |
| Secondary index | On non-ordering field — records NOT sorted by this field | Equality lookup on non-PK |
| Dense index | One entry per record | Fast lookup, more space |
| Sparse index | One entry per disk block | Less space, only works with ordered data |
| Hash index | Hash function maps key to bucket | Exact equality O(1), no range queries |
GATE: Clustered vs non-clustered index
Clustered index physically orders table data — only ONE per table. Non-clustered index is a separate structure with pointers — multiple allowed. Primary key is automatically clustered in MySQL InnoDB. PostgreSQL: use CLUSTER command. B+ trees support both; hash indexes are always non-clustered.
Practice questions (GATE-style)
- B+ tree of order 5 with 500 records. What is the maximum height? (Answer: min fanout = ceil(5/2) = 3. ceil(log₃(500)) = ceil(5.66) = 6.)
- Why does B+ tree support range queries better than a B-tree? (Answer: B+ tree leaf nodes form a linked list — traverse leaves sequentially after finding start key. B-tree stores data in internal nodes too, requiring full tree traversal for range queries.)
- Hash index on Salary supports which query efficiently? (Answer: Only equality: WHERE Salary = 50000. Cannot support WHERE Salary > 50000 — hash scatters nearby values to different buckets.)
- Difference between dense and sparse index? (Answer: Dense: one entry per record — fast lookup, more space. Sparse: one entry per disk block — less space, only works on ordered data.)
- Relation has 10,000 records, B+ tree order = 50. What is the height? (Answer: min fanout = 25. ceil(log₂₅(10000)) = ceil(2.86) = 3. Just 3 disk accesses!)
On LumiChats
When LumiChats helps optimise slow SQL queries, the first suggestion is often 'add an index on the WHERE clause column.' Understanding B+ trees explains WHY: instead of scanning all rows, the DB traverses a 3-4 level tree to find your data instantly.
Try it free