Postgres: Index Scans

Using an index to improve query performance is a fundamental database practice. An index is a specialized structure organized to make data access cheaper than scanning the entire disk. In PostgreSQL, the data stored on disk is referred to as the Heap.
The Reality of Indexes
Having an index on a table does not guarantee that it will be used. The PostgreSQL planner evaluates multiple execution paths and chooses the one with the lowest estimated cost.
It is vital to test and measure the effect of an index using the EXPLAIN command. Blindly adding indexes can actually degrade performance, as they increase the overhead for write operations (INSERT, UPDATE, DELETE) and consume disk space.
1. Practical Scenario: When Indexes are Ignored
Let's create a table with 10 million records and a composite index to see how the planner behaves.
Setup
-- Create table
CREATE TABLE foo(id1 INT, id2 INT, id3 INT, id4 INT, descr TEXT);
-- Insert 10M records
INSERT INTO foo
SELECT i, i*3, i+i, i*2, 'hello' || i
FROM generate_series(1, 10000000) i;
-- Create a composite index
CREATE INDEX idx_id1_id2_id3 ON foo(id1, id2, id3);
Running EXPLAIN
If we run a query that filters for a large portion of the table:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000;
Planner Output:
Seq Scan on foo (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.283..2260.973 rows=9999000 loops=1)
Filter: (id1 > 1000)
Rows Removed by Filter: 1000
Buffers: shared hit=16274 read=57256
Why was the Sequential Scan chosen? When using an index, the database must read the index structure and then fetch the relevant data pages from the heap. This often results in 2 disk I/Os per row. If the query returns almost the entire table (low selectivity), reading the file linearly (Seq Scan) is much faster than jumping between the index and the heap.
2. Analyzing Alternative Plans
Standard EXPLAIN only shows the "winning" plan. To understand why the index was rejected, we can use an extension like PG_ALL_PLANS to see all considered paths.
Comparative Costs for id1 > 1000:
- Plan 1 (Seq Scan): Cost 198,530.00 (Winner).
- Plan 2 (Index Scan): Cost 498,815.28 (Rejected due to high I/O cost).
- Plan 3 (Bitmap Heap Scan): Cost 429,813.47 (Rejected; lower than index scan but higher startup than seq scan).
- Plan 4 (Parallel Seq Scan): Cost 1,126,516.03 (Rejected).
3. Increasing Selectivity to Force Index Usage
We can persuade the planner to use the index by reducing the number of rows returned. This increases the selectivity of the query.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM foo WHERE id1 > 1000 AND id1 < 1500000;
Updated Result:
Index Scan using idx_id1_id2_id3 on foo (cost=0.43..187892.55 rows=1498365 width=28)
Index Cond: ((id1 > 1000) AND (id1 < 1500000))
Buffers: shared hit=2 read=16768
Now, the Index Scan cost (~187k) is lower than the Seq Scan cost (~223k). Because we are only fetching ~1.5M rows instead of 10M, the index becomes the most efficient path.
Summary & Best Practices
Selectivity is Key: Indexes are most effective when they filter out the vast majority of the data.
Use EXPLAIN ANALYZE: Always verify your performance assumptions in a staging environment.
Monitor Usage: Indexes are not free. Use
pg_stat_user_indexesto identify and remove unused indexes that slow down writes.Analyze Statistics: Ensure the planner has up-to-date data by running the
ANALYZEcommand regularly.






