Skip to main content

Command Palette

Search for a command to run...

Postgres: Index Scans

Updated
3 min read
Postgres: Index Scans

Using an index to improve query performance is a fundamental database practice. An index is a specialized structure organized to make data access cheaper than scanning the entire disk. In PostgreSQL, the data stored on disk is referred to as the Heap.


The Reality of Indexes

Having an index on a table does not guarantee that it will be used. The PostgreSQL planner evaluates multiple execution paths and chooses the one with the lowest estimated cost.

It is vital to test and measure the effect of an index using the EXPLAIN command. Blindly adding indexes can actually degrade performance, as they increase the overhead for write operations (INSERT, UPDATE, DELETE) and consume disk space.


1. Practical Scenario: When Indexes are Ignored

Let's create a table with 10 million records and a composite index to see how the planner behaves.

Setup

-- Create table
CREATE TABLE foo(id1 INT, id2 INT, id3 INT, id4 INT, descr TEXT);

-- Insert 10M records
INSERT INTO foo 
SELECT i, i*3, i+i, i*2, 'hello' || i 
FROM generate_series(1, 10000000) i;

-- Create a composite index
CREATE INDEX idx_id1_id2_id3 ON foo(id1, id2, id3);

Running EXPLAIN

If we run a query that filters for a large portion of the table:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo WHERE id1 > 1000;

Planner Output:

Seq Scan on foo  (cost=0.00..198530.00 rows=9999027 width=28) (actual time=0.283..2260.973 rows=9999000 loops=1)
   Filter: (id1 > 1000)
   Rows Removed by Filter: 1000
   Buffers: shared hit=16274 read=57256

Why was the Sequential Scan chosen? When using an index, the database must read the index structure and then fetch the relevant data pages from the heap. This often results in 2 disk I/Os per row. If the query returns almost the entire table (low selectivity), reading the file linearly (Seq Scan) is much faster than jumping between the index and the heap.


2. Analyzing Alternative Plans

Standard EXPLAIN only shows the "winning" plan. To understand why the index was rejected, we can use an extension like PG_ALL_PLANS to see all considered paths.

Comparative Costs for id1 > 1000:

  • Plan 1 (Seq Scan): Cost 198,530.00 (Winner).
  • Plan 2 (Index Scan): Cost 498,815.28 (Rejected due to high I/O cost).
  • Plan 3 (Bitmap Heap Scan): Cost 429,813.47 (Rejected; lower than index scan but higher startup than seq scan).
  • Plan 4 (Parallel Seq Scan): Cost 1,126,516.03 (Rejected).

3. Increasing Selectivity to Force Index Usage

We can persuade the planner to use the index by reducing the number of rows returned. This increases the selectivity of the query.

EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM foo WHERE id1 > 1000 AND id1 < 1500000;

Updated Result:

Index Scan using idx_id1_id2_id3 on foo  (cost=0.43..187892.55 rows=1498365 width=28)
   Index Cond: ((id1 > 1000) AND (id1 < 1500000))
   Buffers: shared hit=2 read=16768

Now, the Index Scan cost (~187k) is lower than the Seq Scan cost (~223k). Because we are only fetching ~1.5M rows instead of 10M, the index becomes the most efficient path.


Summary & Best Practices

  • Selectivity is Key: Indexes are most effective when they filter out the vast majority of the data.

  • Use EXPLAIN ANALYZE: Always verify your performance assumptions in a staging environment.

  • Monitor Usage: Indexes are not free. Use pg_stat_user_indexes to identify and remove unused indexes that slow down writes.

  • Analyze Statistics: Ensure the planner has up-to-date data by running the ANALYZE command regularly.

More from this blog

Go & DevOps Blog

24 posts

Backend Developer | Python | Go | gRPC | Kubernetes | Ansible | IaC