img

Optimising an index can have quite a significant impact.

From a customer’s legacy system, we got this query:
SELECT IDX FROM PARAMETERINFO
WHERE (NAME = $1
    OR
       (LOWER(NAME) = LOWER($1) AND FLAGS & x'80'::int4 > 0))
  AND FLAGS & x'20'::int4 > 0
According to pg_stat_statements it is taking about 8.656 milliseconds for an average execution of this query. It is not too complex and on a relatively small table (about 13.000 rows), but it is performed quite often (359.513.795 in one year), and is causing a considerable amount of the system’s overall load.
select 359513795 / 365.0 / 24.0 / 60.0 as per_minute,
359513795 * 8.656 / 1000.0 / 60.0 / 60.0 / 365.0 as spent_hours_per_day ;

per_minute,          spent_hours_per_day
684.0064592846271,   2.3683039646270928
Note the old-school bit-juggling that is done on the flags column, which, though neat, does not really match a database system’s way of working all that well, unless some tricks are performed.
As a last note, since this query comes from a legacy system, taking the obvious path of rewriting the query or building a wrapper-function was not an option.

So let’s make this faster.


The quickest and least invasive way to make any select query quicker is by adding an index. There were already indexes in place, on NAME and on FLAGS , but since both columns need some conversion in this query, they were not used, hence... tablescans.
Original QUERY PLAN on a local copy before any changes:
Seq Scan on parameterinfo  (cost=0.00..535.30 rows=8 width=4) (actual time=7.324..7.324 rows=0 loops=1)
  Filter: (((flags & '32'::bigint) > 0) AND (((name)::text = 'Blah'::text) OR ((lower((name)::text) = 'Blah'::text) AND ((flags & '128'::bigint) > 0))))
  Rows Removed by Filter: 13029
Planning Time: 0.090 ms
Execution Time: 7.337 ms
(Local copy, no load, so the numbers are not fully the same)
Rule of thumb: an index that is not used is only detrimental as it hurts inserts and updates, but also makes planning more complex, and then even hurts selectes, so unused indexes should be removed.

Observe that the OR in the WHERE conditions basically splits the query into two parts:
SELECT IDX FROM PARAMETERINFO
WHERE (NAME = $1
    OR
       (LOWER(NAME) = LOWER($1) AND FLAGS & x'80'::int4 > 0))
  AND FLAGS & x'20'::int4 > 0
Is equivalent to
SELECT IDX FROM PARAMETERINFO
WHERE NAME = $1
  AND FLAGS & x'20'::int4 > 0

UNION

SELECT IDX FROM PARAMETERINFO
WHERE LOWER(NAME) = LOWER($1) AND FLAGS & x'80'::int4 > 0
  AND FLAGS & x'20'::int4 > 0
So to properly solve it there should be 2 indexes, one for each part.
The first step was to add a regular index on NAME for the first part, and a functional (or expressional, RTFM) index on LOWER(NAME), this prevents Postgres from having to do a sequential scan and do the LOWER(NAME) conversion on every row for the second part.
create index concurrently on parameterinfo (NAME);
create index concurrently on parameterinfo (LOWER(NAME));
analyze table parameterinfo;
This functional index did work, it made the query faster. I will not bore you with the analysis, because we are not done yet.

We can go faster.


Observe that the bitmasks (x'80' and x'20') seem to be fixed values, otherwise they would be parameterized by pg_stat_statements as $2 and $3. Let’s use that, with a partial index. (RTFM)
Remove the other indexes, and
create index concurrently on parameterinfo (name) where (FLAGS & x'20'::int4) > 0 ; 
create index concurrently on parameterinfo (LOWER(NAME)) where (FLAGS & x'80'::int4) > 0 and (FLAGS & x'20'::int4) > 0 ;
analyze table parameterinfo;
With this we make use of the fixed parameters, and the fact that they are mapping exactly on the where clauses of the partial indexes. This also has the effect that the comparisons with the FLAGS column do not have to be done in the query execution anymore, since if they match, they are in the index.
This partial functional index did work, it made the query faster. I will not bore you with the analysis, because we are still not done yet.

We can go even faster.


The default index in Postgres is a B-Tree index, which is usually a great choice, but in some cases, there is an even better option. In this case for example we can do without most of the features that are offered by a B-tree implementation in Postgres, such as duplicate-culling and range support. We are only looking for exact matches and nothing else. For that there is the HASH index (RTFM), which offers less flexibility and less features, but is faster and almost always smaller (which also speeds it up) then a B-tree index.
create index concurrently on parameterinfo using hash(name) where (FLAGS & x'20'::int4) > 0 ;
create index concurrently on parameterinfo using hash(LOWER(NAME)) where (FLAGS & x'80'::int4) > 0 and (FLAGS & x'20'::int4) > 0 ;
analyze table parameterinfo;
Lets check the new QUERY PLAN:
Bitmap Heap Scan on parameterinfo  (cost=8.06..35.18 rows=8 width=4) (actual time=0.013..0.014 rows=0 loops=1)
  Recheck Cond: ((((name)::text = 'Lou'::text) AND ((flags & '32'::bigint) > 0)) OR ((lower((name)::text) = 'lou'::text) AND ((flags & '128'::bigint) > 0) AND ((flags & '32'::bigint) > 0)))
  ->  BitmapOr  (cost=8.06..8.06 rows=8 width=0) (actual time=0.011..0.011 rows=0 loops=1)
        ->  Bitmap Index Scan on parameterinfo_name_idx  (cost=0.00..4.01 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1)
              Index Cond: ((name)::text = 'Lou'::text)
        ->  Bitmap Index Scan on parameterinfo_lower_idx1  (cost=0.00..4.05 rows=7 width=0) (actual time=0.003..0.003 rows=0 loops=1)
              Index Cond: (lower((name)::text) = 'lou'::text)
Planning Time: 0.101 ms
Execution Time: 0.035 ms

So we went from 0.090 ms (Planning Time) + 7.337 ms (Execution Time) = 7.427 ms
To 0.101 ms (Planning Time) + 0.035 ms (Execution Time) = 0.136 ms
Which is just shy of 55 times faster.

In tests on the customer’s dev environment this gave a performance gain of a very similar factor.
This means that we went from keeping the system fully working on this one query for nearly two and a half hours per day to two and a half minutes per day, by adding two indexes , one of them of the pretty cool partial functional hash index variety.
Not sure how that translates into cost, energy and CO2 savings for this customer, but... not bad!



Ellert van Koperen, February 2025.