Database.one Blog - In this episode: How to do a count(*) estimate without a tablescan in PostgreSQL.

How to do a quick count(*) estimate in PostgeSQL

In PostgreSQL a count(*) has to actually do a tablescan to get an accurate answer, taking into account locking and the whole mechanism in which PostgreSQL guarantees the correctness of its responses (in fact the whole ACID compliancy of the database) and the 'row visibility' for each row.
But if an estimate (that usually is spot on) is good enough then this trick is a lifesaver.

Normally one would do a

SELECT COUNT(*) FROM table_name;

or a variant thereof. Interestingly enough the difference that MySQL makes between a count(*) and count(index_field) does not apply in PostgreSQL.
There is a very elegant solutions that uses the automatically updated table statistics to get an estimate:

SELECT reltuples AS estimate FROM pg_class WHERE relname = 'table_name';

Thats all !

The response is usually spot-on, but can deviate under the following conditions:
- If the table is in the middle of updates, inserts or deletes.
- If ANALYZE has not run after significant changes.
- If the Autovacuum Daemon is not running (which is something you should never switch off unless you have a very good reason!).

If you need a somewhat more accurate estimate (but remember, it is still an estimate!) try the following nifty function coined by Michael Fuhr that abuses the query planner to get a very accurate estimate:


CREATE FUNCTION count_estimate(query text)
  RETURNS integer
  LANGUAGE plpgsql AS
$func$
DECLARE
    rec   record;
    rows  integer;
BEGIN
    FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
        rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
        EXIT WHEN rows IS NOT NULL;
    END LOOP;

    RETURN rows;
END
$func$;

Which is executed as follows:

SELECT count_estimate('SELECT * FROM table_name');

This function is much more accurate as it does not rely on the running of auto_vacuum. I have seen differences of up to 40.000 in a very heavy insertion test on a docker-environment, apparently just before an auto_vacuum was triggered. Still good enough to get a ball-park figure (less then 10% off in this case) but the count_estimate function seems to give the better accuracy at no additional cost.

Give it a spin, you may like how quick it is.

Ellert van Koperen, April 2021.

Sharing is caring.

How to do a quick count(*) estimate in PostgeSQL

Information

Support