Обсуждение: Error correction for n_dead_tuples

Поиск
Список
Период
Сортировка

Error correction for n_dead_tuples

От
ITAGAKI Takahiro
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I'm concerned that this one creates
> an open-loop behavior in which the n_dead_tuples estimate will diverge
> arbitrarily far from reality over time.  I criticized the original
> proposal on that basis, and I'm not convinced this version fixes it,
> because of the fact that stats counter updates occur much later than the
> actions they count.  (My recent patch to rate-limit tabstat messages made
> that problem worse, but it existed anyway.)  What might make sense is for
> vacuum to count the number of dead-but-not-removable tuples it skips over,
> and apply that as the value of n_dead_tuples on receipt of the vacuum
> message (instead of setting to zero as now).  This is likely to be wrong
> with respect to the actions of transactions running concurrently with the
> vacuum, but I think so is the proposed patch; and at least in this form
> the error certainly cannot accumulate across vacuum cycles.

In my understanding, there are two proposal to change the way of updating
n_dead_tuples by vacuum presently:
 Set n_dead_tuples to the number of    1. unvacuumable tuples the vacuum has seen   2. dead tuples reported to stats
collectorafter the beginning of vacuum at the end of vacuum. 

Both methods don't accumulate errors across vacuum cycles, because dead
tuples statistics at the beginning of vacuum is cleared in both of them.
Also, if there is no background updates, the n_dead_tuples is certainly
set to zero.

I think the 2nd is better. If we update or delete tuples in the pages
that have been already scanned by vacuum, the vaccum cannot see the
newly created dead tuples. The vacuum could report fewer number as
the unvacuumable tuples.

The following is a test with a patch of the 2nd fix. The vacuum reports
'960 dead row versions cannot be removed yet', that is used in 1st method,
but the actual dead tuples are 2000, that is used in 2nd method.
...so I'll propose the 2nd method again.

Comments welcome.


# SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'accounts';n_live_tup | n_dead_tup
------------+------------    100000 |          0

# vacuum verbose accounts; (and 'pgbench -n -N -t2000' concurrently)
INFO:  vacuuming "public.accounts"
INFO:  index "accounts_pkey" now contains 100778 row versions in 276 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.70 sec.
INFO:  "accounts": found 0 removable, 100778 nonremovable row versions in 1679 pages
DETAIL:  960 dead row versions cannot be removed yet.
There were 1626 unused item pointers.
876 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 5.78 sec.
VACUUM

# SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'accounts';n_live_tup | n_dead_tup
------------+------------    100000 |       2000

# SELECT tuple_count, dead_tuple_count FROM pgstattuple('accounts');tuple_count | dead_tuple_count
-------------+------------------     100000 |             2000

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: Error correction for n_dead_tuples

От
Tom Lane
Дата:
I think this patch needs to be revisited in the light of the work I just
did to make pgstats correctly track the effects of aborted transactions.
It seems fairly likely to me that most of the problem you were seeing
was actually due to that stupidity, and not to race conditions between
VACUUM and concurrent transactions.  So maybe we no longer need to do
anything.

It might be worth fixing things so that a VACUUM sets n_dead_tuples to
the number of deleted-but-not-removable tuples it saw, rather than
always setting to zero as is presently done.
        regards, tom lane