Re: To what extent should tests rely on VACUUM ANALYZE?

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: To what extent should tests rely on VACUUM ANALYZE?
Дата
Msg-id ef34cee4-ddd3-30ec-3222-bee25ae60f8f@gmail.com
обсуждение исходный текст
Ответ на Re: To what extent should tests rely on VACUUM ANALYZE?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: To what extent should tests rely on VACUUM ANALYZE?  (Alexander Lakhin <exclusion@gmail.com>)
Re: To what extent should tests rely on VACUUM ANALYZE?  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
28.03.2024 20:33, Tom Lane wrote:
>
>> But I'm a bit confused - how come the estimates do change at all? The
>> analyze simply fetches 30k rows, and tenk only has 10k of them. So we
>> should have *exact* numbers, and it should be exactly the same for all
>> the analyze runs. So how come it changes like this?
> It's plausible that the VACUUM ANALYZE done by test_setup fails
> ConditionalLockBufferForCleanup() sometimes because of concurrent
> activity like checkpointer writes.  I'm not quite sure how we
> get from that to the observed symptom though.  Maybe the
> VACUUM needs DISABLE_PAGE_SKIPPING?

Yeah, the way from ConditionalLockBufferForCleanup() returning false to
reltuples < 10000 is not one-step, as I thought initially. There is also
sanity_check doing VACUUM in between. So, effectively the troublesome
scenario is:
VACUUM ANALYZE tenk2; -- with cleanup lock not granted for some blocks
VACUUM tenk2;

In this scenario, lazy_scan_heap() -> vac_estimate_reltuples() called two
times.
First, with rel_pages: 384, vacrel->scanned_pages: 384,
vacrel->live_tuples: 10000 and it results in
vacrel->new_live_tuples = 10000,

And second, with rel_pages: 345, vacrel->scanned_pages: 80,
vacrel->live_tuples: 2315 (for instance), and we get
vacrel->new_live_tuples = 9996,

With unmodified ConditionalLockBufferForCleanup() the second call is
performed with rel_pages: 345, vacrel->scanned_pages: 1,
vacrel->live_tuples: 24 and it returns 10000.

This simple change fixes the issue for me:
-VACUUM ANALYZE tenk2;
+VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2;

But it looks like subselect is not the only test that can fail due to
vacuum instability. I see that create_index also suffers from cranky
ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)
return false; ), although it placed in parallel_schedule before
sanity_check, so this failure needs another explanation:
-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
+(4 rows)

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby