The Unsung Heroes of Database Performance: Index Structures
Slow database queries are a problem most of us have encountered at least once in our careers. Sometimes it manifests as a report taking 3 hours, other times as an API endpoint making users wait. One of the most significant factors underlying these performance issues is the non-use or misconfiguration of appropriate index structures. In this post, I will delve into the most commonly used index types in PostgreSQL (B-tree, GIN, BRIN) and examine when we should prefer each one with concrete examples. In my 20 years of field experience, I’ve seen that understanding index structures forms the foundation of query optimization.
While reading EXPLAIN ANALYZE output is one of the first things that comes to mind when discussing query optimization, it’s essential to understand the underlying data structure for this output to be meaningful. A database index is essentially like a book index; it allows you to jump directly to the relevant section instead of scanning page by page for the information you’re looking for. However, each index type has its own advantages and disadvantages. Index choices made without knowing what data you access and how frequently can degrade performance instead of improving it, and even unnecessarily consume disk space.
B-tree Index: The All-Rounder
The B-tree (Balanced Tree) is the most widely used index structure in modern databases. It offers excellent performance for queries using equality (=) and comparison (>, <, >=, <=) operators. Data is stored in a sorted manner within a tree structure. This allows search operations to have logarithmic time complexity (O(log n)). This means that even as the amount of data increases, the search time grows very slowly. When a query has a condition like WHERE id = 123, we can directly reach the relevant data with a B-tree index.
In a production ERP system, adding a B-tree index to the order_date column of the orders table dramatically improves performance when querying for orders within a specific date range (WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31'). Without this index, the database would have to scan the entire table. This situation leads to unacceptably slow performance on tables with millions of rows. For example, on an orders table with approximately 10 million rows, a date range query without an index on order_date might take 5 seconds, whereas with a correctly configured B-tree index, this time can drop to below 50 milliseconds. This creates a massive difference in user experience.
-- Example of creating a B-tree index
CREATE INDEX idx_orders_order_date ON orders (order_date);
-- Example query
SELECT * FROM orders WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31';
B-tree indexes can be created not only for a single column but also for multiple columns (composite index). This is even more effective in queries like WHERE column1 = 'A' AND column2 > 100. However, the order of columns is important here. Generally, columns with higher selectivity (containing more distinct values) should be placed first.
Hash and GiST Indexes: Tools for Special Cases
Between B-tree and GIN/BRIN sit two less-discussed but occasionally critical index types: Hash and GiST. A Hash index runs the key through a hash function and points directly to the data’s location on disk. By design it serves only equality (=) queries — sometimes even faster than a B-tree — but it supports no ordering or range queries, because hashing does not preserve order. If all you ever do on a products table’s product_code is an exact match, Hash can make sense; but the moment a query like LIKE 'XYZ%' or > 'ABC' appears, it becomes useless. As a field rule I default to B-tree: since it supports both equality and range, its flexibility usually outweighs the marginal speed edge of Hash.
-- Hash index (equality queries only)
CREATE INDEX idx_products_code_hash ON products USING HASH (product_code);
SELECT * FROM products WHERE product_code = 'XYZ123';
GiST (Generalized Search Tree) provides a generalized search structure for varied data types; it is the perfect fit for geometric data, geographic location, and hierarchical data. Extensions like PostGIS are built on GiST. In a GIS application that needs to find points within a certain radius of a coordinate, a GiST index on the coordinates column drops the query below a second using functions like ST_DWithin. GiST can consume more disk space than a B-tree and has a higher write cost; its effectiveness depends on the data type and how the index is configured.
-- GiST index with the PostGIS extension
CREATE INDEX idx_locations_coordinates ON locations USING GIST (coordinates);
SELECT name FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(longitude, latitude), radius_in_meters);
GIN Index: Ideal for Text Search and JSON Data
GIN (Generalized Inverted Index) is preferred, especially when working with text search (tsvector, tsquery) and complex data types like JSONB. A GIN index stores a list of rows corresponding to each value. For example, for every word found in the content of a blog post, an index entry is created showing which posts that word appears in. This structure makes full-text search queries very fast.
The GIN index plays a critical role in search queries performed on product descriptions of an e-commerce site. When a user searches for “blue denim jeans,” the database can find the relevant products in seconds thanks to the GIN index. If a GIN index were not used, every search query would have to scan all product descriptions, which would be an incredibly slow process for millions of products. In a financial analysis tool I developed, I significantly sped up complex queries by using GIN indexes on jsonb columns when users analyzed various financial text inputs.
Creating a GIN index in PostgreSQL often involves operator classes like gin_trgm_ops. These operators are optimized for finding similarities or exact matches between texts.
-- Example of creating a GIN index (PostgreSQL 12+)
CREATE INDEX idx_products_description_gin ON products USING GIN (to_tsvector('english', description));
-- Full-text search query
SELECT * FROM products WHERE to_tsvector('english', description) @@ to_tsquery('english', 'blue & denim & jeans');
GIN indexes are also very useful when working with JSONB data. For instance, if an order details table has a jsonb column named items, and you query for product IDs or quantities within this column, a GIN index will speed up these queries.
In production, the most robust way to make full-text search permanent is to materialize the tsvector in a dedicated column and keep it current with a trigger. In a CRM application where we stored customer comments in a comments table, I set it up like this: a comment_tsv column derived from the comment text, a trigger that refreshes it on every INSERT/UPDATE, and a GIN index on that column. This way to_tsvector isn’t recomputed on every query — the search runs straight off the index.
-- tsvector column + automatic update trigger + GIN index
ALTER TABLE comments ADD COLUMN comment_tsv tsvector;
UPDATE comments SET comment_tsv = to_tsvector('english', comment_text);
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON comments FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(comment_tsv, 'pg_catalog.simple', comment_text);
CREATE INDEX idx_comments_tsv ON comments USING GIN (comment_tsv);
SELECT * FROM comments WHERE comment_tsv @@ to_tsquery('english', 'product & suitable');
BRIN Index: A Lightweight Option for Large and Ordered Data
BRIN (Block Range Index) is an index type that takes up less space than B-tree and GIN indexes but can be quite effective under certain conditions. A BRIN index indexes ranges of data blocks. That is, an index entry stores the minimum and maximum values of values within a specific data block range. This index performs best when the data is physically ordered. For example, it’s effective for time-series data or geographical location data that are naturally ordered or grouped.
Consider sensor data collected on an IoT platform. This data is typically written to disk in chronological order. In such cases, a BRIN index added to the timestamp column uses significantly less disk space than a B-tree for querying data within a specific time range and can offer similar performance. For example, on a sensor database with 1 billion rows, adding a B-tree index to the timestamp column might occupy hundreds of megabytes, while a BRIN index might only take up a few megabytes. This is a significant advantage, especially in environments where storage costs are high.
-- Example of creating a BRIN index
CREATE INDEX idx_sensor_data_timestamp_brin ON sensor_data USING BRIN (timestamp);
-- Example query (time range)
SELECT * FROM sensor_data WHERE timestamp BETWEEN '2026-05-21 00:00:00' AND '2026-05-21 23:59:59';
However, if the data is not physically ordered, the effectiveness of a BRIN index is greatly reduced. For instance, if data is frequently inserted and deleted and not kept in an ordered manner, the minimum and maximum values within the same data block will not be very close to each other. This causes the index to scan more data blocks during a search. Therefore, it’s important to be aware of the physical ordering of the data before using a BRIN index.
Trade-offs and Considerations in Index Selection
Index selection is not just about performance improvement; it also involves factors such as disk space usage, data insertion/update/deletion (DML) speeds, and maintenance costs. Every index imposes an additional burden on the database.
In an enterprise software development project, there can be an initial tendency to add an index to every query. However, over time, it becomes apparent that so many indexes make data entry incredibly slow. For example, updating multiple indexes when registering a user can extend the operation from seconds to tens of seconds. At this point, regularly checking system tables like pg_stat_user_indexes and removing unused indexes to determine which indexes are truly necessary is critical.
Furthermore, it’s necessary to consider the overall workflow of the application, not just query performance. In an ERP system, indexes optimized for reporting queries can negatively affect the performance of data entry screens. Therefore, the best approach is to address both read (SELECT) and write (INSERT, UPDATE, DELETE) operations in a balanced manner. Analyzing the EXPLAIN ANALYZE output under different application scenarios, not just on a per-query basis, is important.
Advanced Index Types: Partial, Expression, and Covering
While the basic index types (B-tree/GIN/BRIN) cover most of the work, the real performance gains often come from applying these three advanced techniques in the right place. They’re all standard PostgreSQL tools, but little known — you don’t feel the need for them until you hit the matching problem.
A partial index indexes only a specific subset of the table. It both shrinks the index size and lowers write cost. In a client project we rarely queried cancelled orders; on an orders table with millions of rows, queries with status = 'cancelled' were slow, yet cancelled records were only 1% of the table. Instead of indexing the whole table, we created a partial index for just that subset:
CREATE INDEX idx_orders_cancelled ON orders (order_id) WHERE status = 'cancelled';
Because this index contained only the order_ids of cancelled orders, it was much smaller and noticeably sped up the relevant queries.
An expression index indexes the result of a function/expression applied to a column rather than the column itself. The to_tsvector above is one example; another is case-insensitive search. If you frequently run WHERE lower(username) = 'admin' on a user table, without indexing the lower(username) expression this query does a full scan every time:
CREATE INDEX idx_users_lower_username ON users (lower(username));
SELECT * FROM users WHERE lower(username) = 'admin';
A covering index (with INCLUDE, PostgreSQL 11+) lets the query complete using the index alone, without ever touching the main table. In a financial reporting tool I was pulling transaction details for a specific account and date range; there was already a B-tree index on account_id and transaction_date, but since the query also returned the description, every row required a heap fetch. By moving the description into the INCLUDE clause I eliminated the heap access:
CREATE INDEX idx_transactions_account_date ON transactions (account_id, transaction_date) INCLUDE (description);
Column Order in a Composite Index: A Field Case
Knowing the theory of advanced techniques isn’t enough; the truly instructive part is field cases. A few months ago, in a production ERP, the “pending production orders” list on the operator screen stretched to 15 seconds at the start of the morning shift. The query filtered the production_orders table by shift_id, status = 'pending', and a planned_start_date range. The EXPLAIN ANALYZE output showed a Seq Scan; there were separate indexes on shift_id and planned_start_date, but none on status.
The root cause was the low cardinality of the status column: 80% of the table was completed, only 10% pending. The planner found an index scan on pending alone more expensive than scanning the whole table. But combined with shift_id and planned_start_date, the status filter became very selective. The fix was to place the low-cardinality column last in the composite index:
CREATE INDEX idx_production_orders_shift_date_status
ON public.production_orders (shift_id, planned_start_date, status);
After this index, the plan switched to an Index Scan and the query time instantly dropped from 15 seconds to 250 milliseconds. The lesson is clear: a low-cardinality column that’s useless on its own can be very valuable when appended after higher-cardinality columns, letting the composite index perform the final filtering step from the index.
Advanced Tips for Query Optimization
After understanding the fundamentals of index structures, we can take more advanced steps for query optimization. One of these is understanding the behavior of PostgreSQL’s query planner and providing hints when necessary. The ANALYZE command helps the planner make better decisions by updating statistics.
The pg_stat_statements module is a great resource for identifying the most used and slowest queries. By enabling this module, you can see which queries need optimization the most. This allows you to focus your resources on areas that will provide the most benefit. For example, in a financial calculator application, optimizing the top 10 most frequently used queries can improve the overall performance of the application by more than 30%.
It’s important to remember that indexes are not magic wands. Sometimes the root of the problem lies not in the index structure but in the database design, poorly written SQL queries, or insufficient hardware resources. Therefore, it’s important to conduct a comprehensive analysis and accept that the problem may not be reducible to a single cause. When solving performance issues encountered in an Android spam blocking application, a side project I developed, index optimization was only one part; I achieved the real improvements by redesigning the data model.
Why Automated Index Tools Fall Short: My Manual Verification Workflow
In recent years, automated index advisor tools have become widespread, and they can scan enormous query logs in minutes to surface candidate indexes. I find them valuable as a starting point — but never as the final decision. These tools rely on statistical models and historical query patterns; they can’t see the live data distribution in production, transient load spikes, or application-specific business logic. A classic example: a tool suggests two separate indexes instead of (status, order_date), because it can’t treat the ORDER BY sort path as a key. Or it recommends an index on a column that’s 80% a single value; because that index points to nearly the entire table, it slows the query down. At one client, rolling back exactly such a suggestion made the query 300% faster. In my experience, an index from an automated suggestion often yields 5-10% gain, whereas a composite index I build after manually analyzing the query’s filtering and sorting paths can reach 30-40%.
That’s why I run every suggestion through this seven-step workflow:
- Treat it as a starting point: I take the tool’s suggestion as a “hint,” never pushing it straight to production.
- Examine the query plan: I check how the suggested index changes the relevant query’s
EXPLAIN ANALYZEoutput — verifying the index is actually used and how much it lowers cost. - Understand data distribution: Using
pg_statsor custom queries, I measure the cardinality of the suggested column; is it high, or low-cardinality but frequently filtered? - Consider application context: Which workflow is the query part of, how often does it run, and how does it align with overall performance goals?
- Evaluate trade-offs: I weigh the read gain against write cost and disk-space impact.
- Do real-world testing: Where possible I measure on a staging environment with the same data volume, and roll back if I see a negative effect.
- Optimize iteratively: It’s not a one-time job; as the system grows and query patterns evolve, I revisit the indexes.
I appreciate the speed automation provides, but database optimization is still a matter of judgment: the final call should always rest with the system’s architect.
Conclusion: Taking Performance to the Peak with Index Structures
Database index structures are the cornerstone of query performance. Knowing when and how to use different index types like B-tree, GIN, and BRIN is a critical skill for database administrators and developers. The right index selection can multiply query speed, improve user experience, and reduce infrastructure costs.
Each of the index types we examined in this post has its own unique advantages and disadvantages. While B-tree is a standard choice for general-purpose queries, GIN is ideal for text and unstructured data, and BRIN is ideal for ordered data. When making index selections, it’s important to consider not only read performance but also DML operations and disk space usage. Remember, the best index is the index you need; neither more nor less.
In summary, optimizing database performance is an ongoing process. Regularly analyzing query plans, keeping statistics up-to-date, and cleaning up unused indexes are vital for the health of your system. With this knowledge, I believe you can unlock the full potential of your databases.