The Covering Index Paradox
SQLite index performance gap explained by B-tree height difference.
While recently tuning Bit’s performance, I encountered a bottleneck in one highly-selective query SELECT id, url FROM links WHERE slug = (?) LIMIT 1
. Initial optimizations, such as replacing UUIDs with rowid
and reducing the size of the slug column, significantly improved throughput. However, when I tested a covering index on (slug, id, url)
, I was surprised to find that it actually slowed down the queries, despite containing all the required columns and theoretically eliminating the need for one extra table lookups. Let’s dive into SQLite’s cost-based optimizer priorities to understand this counterintuitive behavior.
For further context and reproducibility, here it is a separated database with a links table mirroring the original schema, populated with 10000 rows:
CREATE TABLE IF NOT EXISTS links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT NOT NULL UNIQUE,
url TEXT NOT NULL
);
WITH RECURSIVE generate_links AS (
SELECT 1 AS id, 'slug-' || 1 AS slug, 'https://sjdonado.com/link-' || 1 AS url
UNION ALL
SELECT id + 1, 'slug-' || (id + 1), 'https://sjdonado.com/link-' || (id + 1)
FROM generate_links
WHERE id < 10000
)
INSERT INTO links (slug, url)
SELECT slug, url
FROM generate_links;
SQLite automatically creates sqlite_autoindex_links_1
for the UNIQUE
constraint on slug:
test.db> SELECT * FROM pragma_index_list('links') AS il JOIN pragma_index_info(il.name) ON 1 = 1;
+-----+--------------------------+--------+--------+---------+-------+-----+------+
| seq | name | unique | origin | partial | seqno | cid | name |
+-----+--------------------------+--------+--------+---------+-------+-----+------+
| 0 | sqlite_autoindex_links_1 | 1 | u | 0 | 0 | 1 | slug |
+-----+--------------------------+--------+--------+---------+-------+-----+------+
1 row in set
Time: 0.004s
SQLite uses a default page size of 4KB, though this can be configured to any power of two between 512 bytes and 64KB. With the default configuration, each internal B-tree node can store approximately 500 child pointers, creating a substantial branching factor. This branching factor directly determines how many levels (the tree height) are needed to index a given number of rows, approximately log₅₀₀(N) levels for N rows. When querying data, each level traversed requires a separate page read operation, so a lower tree height dramatically improves performance. SQLite’s pager module caches recently accessed pages in memory, but the initial read of each required page still imposes I/O costs that scale with tree height.
Covering Indexes
To execute our query, several sequential operations are performed. SQLite will perform a binary search on the index table using SeekGE
, looking for matches with the slug. It will stop there (because of the LIMIT clause, which sets the IdxGT
counter) and then access the corresponding table using DeferredSeek
with the rowid
(IdxRowid
) as a pointer:
test.db> EXPLAIN QUERY PLAN SELECT id, url FROM links WHERE slug = 'slug-9345' LIMIT 1;
+----+--------+---------+------------------------------------------------------------+
| id | parent | notused | detail |
+----+--------+---------+------------------------------------------------------------+
| 4 | 0 | 39 | SEARCH links USING INDEX sqlite_autoindex_links_1 (slug=?) |
+----+--------+---------+------------------------------------------------------------+
1 row in set
Time: 0.005s
test.db> EXPLAIN SELECT id, url FROM links WHERE slug = 'slug-9345' LIMIT 1;
+------+--------------+----+----+----+-----------+----+---------+
| addr | opcode | p1 | p2 | p3 | p4 | p5 | comment |
+------+--------------+----+----+----+-----------+----+---------+
| 0 | Init | 0 | 13 | 0 | <null> | 0 | <null> |
| 1 | Integer | 1 | 1 | 0 | <null> | 0 | <null> |
| 2 | OpenRead | 0 | 2 | 0 | 3 | 0 | <null> |
| 3 | OpenRead | 1 | 3 | 0 | k(1,) | 2 | <null> |
| 4 | String8 | 0 | 2 | 0 | slug-9345 | 0 | <null> |
| 5 | SeekGE | 1 | 12 | 2 | 1 | 0 | <null> |
| 6 | IdxGT | 1 | 12 | 2 | 1 | 0 | <null> |
| 7 | DeferredSeek | 1 | 0 | 0 | <null> | 0 | <null> |
| 8 | IdxRowid | 1 | 3 | 0 | <null> | 0 | <null> |
| 9 | Column | 0 | 2 | 4 | <null> | 0 | <null> |
| 10 | ResultRow | 3 | 2 | 0 | <null> | 0 | <null> |
| 11 | DecrJumpZero | 1 | 12 | 0 | <null> | 0 | <null> |
| 12 | Halt | 0 | 0 | 0 | <null> | 0 | <null> |
| 13 | Transaction | 0 | 0 | 6 | 0 | 1 | <null> |
| 14 | Goto | 0 | 1 | 0 | <null> | 0 | <null> |
+------+--------------+----+----+----+-----------+----+---------+
15 rows in set
Time: 0.005s
One can optimize query operations by avoiding table lookups through storing the url
value in a single concatenated key structure. This is possible due to all columns of a composite index are stored in both B-tree internal nodes and leaf nodes, ensuring the key is sorted on a left-to-right basis. This allows efficient handling of various data distributions without performance degradation, as long as the data follows a deterministic sorting order.
Outcome
Following that optimization idea, let’s introduce a covering index for our table:
CREATE UNIQUE INDEX IF NOT EXISTS idx_links_slug_covering ON links(slug, id, url);
test.db> SELECT * FROM pragma_index_list('links') AS il JOIN pragma_index_info(il.name) ON 1 = 1;
+-----+--------------------------+--------+--------+---------+-------+-----+------+
| seq | name | unique | origin | partial | seqno | cid | name |
+-----+--------------------------+--------+--------+---------+-------+-----+------+
| 0 | idx_links_slug_covering | 1 | c | 0 | 0 | 1 | slug |
| 0 | idx_links_slug_covering | 1 | c | 0 | 1 | 0 | id |
| 0 | idx_links_slug_covering | 1 | c | 0 | 2 | 2 | url |
| 1 | sqlite_autoindex_links_1 | 1 | u | 0 | 0 | 1 | slug |
+-----+--------------------------+--------+--------+---------+-------+-----+------+
4 rows in set
Time: 0.002s
Then, with the new index available, and after performing the same query again, we notice that it was not chosen by SQLite’s query planner:
test.db> EXPLAIN QUERY PLAN SELECT id, url FROM links WHERE slug = 'slug-9345' LIMIT 1;
+----+--------+---------+------------------------------------------------------------+
| id | parent | notused | detail |
+----+--------+---------+------------------------------------------------------------+
| 4 | 0 | 39 | SEARCH links USING INDEX sqlite_autoindex_links_1 (slug=?) |
+----+--------+---------+------------------------------------------------------------+
1 row in set
Time: 0.005s
We can force the query to use our covering index, but there is no notable improvement:
test.db> EXPLAIN QUERY PLAN SELECT id, url FROM links INDEXED BY idx_links_slug_covering WHERE slug = 'slug-9345'
LIMIT 1;
+----+--------+---------+--------------------------------------------------------------------+
| id | parent | notused | detail |
+----+--------+---------+--------------------------------------------------------------------+
| 3 | 0 | 40 | SEARCH links USING COVERING INDEX idx_links_slug_covering (slug=?) |
+----+--------+---------+--------------------------------------------------------------------+
1 row in set
Time: 0.005s
test.db> EXPLAIN SELECT id, url FROM links INDEXED BY idx_links_slug_covering WHERE slug = 'slug-9345' LIMIT 1;
+------+--------------+----+-----+----+-----------+----+---------+
| addr | opcode | p1 | p2 | p3 | p4 | p5 | comment |
+------+--------------+----+-----+----+-----------+----+---------+
| 0 | Init | 0 | 12 | 0 | <null> | 0 | <null> |
| 1 | Integer | 1 | 1 | 0 | <null> | 0 | <null> |
| 2 | OpenRead | 1 | 931 | 0 | k(3,,,) | 2 | <null> |
| 3 | String8 | 0 | 2 | 0 | slug-9345 | 0 | <null> |
| 4 | SeekGE | 1 | 11 | 2 | 1 | 0 | <null> |
| 5 | IdxGT | 1 | 11 | 2 | 1 | 0 | <null> |
| 6 | IdxRowid | 1 | 3 | 0 | <null> | 0 | <null> |
| 7 | Column | 1 | 2 | 4 | <null> | 0 | <null> |
| 8 | ResultRow | 3 | 2 | 0 | <null> | 0 | <null> |
| 9 | DecrJumpZero | 1 | 11 | 0 | <null> | 0 | <null> |
| 10 | Next | 1 | 5 | 0 | <null> | 0 | <null> |
| 11 | Halt | 0 | 0 | 0 | <null> | 0 | <null> |
| 12 | Transaction | 0 | 0 | 6 | 0 | 1 | <null> |
| 13 | Goto | 0 | 1 | 0 | <null> | 0 | <null> |
+------+--------------+----+-----+----+-----------+----+---------+
14 rows in set
Time: 0.007s
The main difference in this last query is that instruction 7, Column
retrieves the slug value directly from the index. Since DeferredSeek
is no longer required, there is one operation less. However, the query is still slower. Why?
The answer lies in the page size and density difference between these index types. Despite both indexes having the same B-tree height for our 1000-row dataset, the covering index requires triple the storage space (18 pages vs 6 pages). This directly translates to increased I/O operations during query execution.
Index | Entry Size | Pages (N=10000) | B-tree Height | Entries/Page | Complexity | Page Reads |
---|---|---|---|---|---|---|
rowid | – | – | 1 | – | O(1) | 1 |
sqlite_autoindex_links_1 | 24 bytes | 60 | 3 | ~166 | O(log N) | 2-3 |
idx_links_slug_covering | 72 bytes | 182 | 4 | ~55 | O(log N) | 4-5 |
For larger datasets, this density disparity would likely increase the B-tree height of the covering index before the autoindex, creating an even larger performance gap. The covering index approach only becomes advantageous when the cost of the additional table lookup DeferredSeek
exceeds the cost of traversing a taller, less dense B-tree structure, typically with very large tables or when retrieving multiple wide columns that would make table lookups particularly expensive.
That’s the paradox – a denser index is expected to be more efficient, yet it can lead to a taller B-tree and potentially worse performance.