Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?
Дата
Msg-id CAH2-WzmBHvxv-7TeAhxB1_N_q8a2HJu25fKjZz7LyMSHNqJS9Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Commit 7c91a0364f standardized the approach we take to estimating
pg_class.reltuples, so that everybody agrees on what that means.
Follow-up work by commit 3d351d91 defined a pg_class.reltuples of -1
as "unknown, probably never vacuumed".

The former commit added this code and comment to vacuumlazy.c:

    /*
     * Now we can provide a better estimate of total number of surviving
     * tuples (we assume indexes are more interested in that than in the
     * number of nominally live tuples).
     */
    ivinfo.num_heap_tuples = vacrelstats->new_rel_tuples;
    ivinfo.strategy = vac_strategy;

I don't see why it makes sense to treat indexes differently here. Why
allow the special case? Why include dead tuples like this?

We make a general assumption that pg_class.reltuples only includes
live tuples, which this code contravenes. It's quite clear that
indexes are no exception to the general rule, since CREATE INDEX quite
deliberately does reltuples accounting in a way that fits with the
usual definition (live tuples only), per comments in
heapam_index_build_range_scan. One of these code paths must be doing
it wrong -- I think it's vacuumlazy.c.

This also confuses the index AM definitions. Whenever we call
ambulkdelete routines, IndexVacuumInfo.num_heap_tuples will always
come from the heap relation's existing pg_class.reltuples, which could
even be -1 -- so clearly its value can only be a count of live tuples.
On the other hand IndexVacuumInfo.num_heap_tuples might include some
dead tuples when we call amvacuumcleanup routines, since (as shown)
the value comes from vacuumlazy.c's vacrelstats->new_rel_tuples. It
would be more logical if IndexVacuumInfo.num_heap_tuples was always
the pg_class.reltuples for the table (either the original/existing
value, or the value that it's just about to be updated to).

That said, I can see why we wouldn't want to allow pg_class.reltuples
to ever be -1 in the case of an index. So I think we should bring
vacuumlazy.c in line with everything else here, without allowing that
case. I believe that the "pg_class.reltuples is -1 even after a
VACUUM" case is completely impossible following the Postgres 15 work
on VACUUM, but we should still clamp for safety in
update_relstats_all_indexes (though not in the amvacuumcleanup path).

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Fix NULL pointer reference in _outPathTarget()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: avoid multiple hard links to same WAL file after a crash