Re: To what extent should tests rely on VACUUM ANALYZE?
От | Alexander Lakhin |
---|---|
Тема | Re: To what extent should tests rely on VACUUM ANALYZE? |
Дата | |
Msg-id | ef34cee4-ddd3-30ec-3222-bee25ae60f8f@gmail.com обсуждение исходный текст |
Ответ на | Re: To what extent should tests rely on VACUUM ANALYZE? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: To what extent should tests rely on VACUUM ANALYZE?
(Alexander Lakhin <exclusion@gmail.com>)
Re: To what extent should tests rely on VACUUM ANALYZE? (Alexander Lakhin <exclusion@gmail.com>) |
Список | pgsql-hackers |
28.03.2024 20:33, Tom Lane wrote: > >> But I'm a bit confused - how come the estimates do change at all? The >> analyze simply fetches 30k rows, and tenk only has 10k of them. So we >> should have *exact* numbers, and it should be exactly the same for all >> the analyze runs. So how come it changes like this? > It's plausible that the VACUUM ANALYZE done by test_setup fails > ConditionalLockBufferForCleanup() sometimes because of concurrent > activity like checkpointer writes. I'm not quite sure how we > get from that to the observed symptom though. Maybe the > VACUUM needs DISABLE_PAGE_SKIPPING? Yeah, the way from ConditionalLockBufferForCleanup() returning false to reltuples < 10000 is not one-step, as I thought initially. There is also sanity_check doing VACUUM in between. So, effectively the troublesome scenario is: VACUUM ANALYZE tenk2; -- with cleanup lock not granted for some blocks VACUUM tenk2; In this scenario, lazy_scan_heap() -> vac_estimate_reltuples() called two times. First, with rel_pages: 384, vacrel->scanned_pages: 384, vacrel->live_tuples: 10000 and it results in vacrel->new_live_tuples = 10000, And second, with rel_pages: 345, vacrel->scanned_pages: 80, vacrel->live_tuples: 2315 (for instance), and we get vacrel->new_live_tuples = 9996, With unmodified ConditionalLockBufferForCleanup() the second call is performed with rel_pages: 345, vacrel->scanned_pages: 1, vacrel->live_tuples: 24 and it returns 10000. This simple change fixes the issue for me: -VACUUM ANALYZE tenk2; +VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2; But it looks like subselect is not the only test that can fail due to vacuum instability. I see that create_index also suffers from cranky ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0) return false; ), although it placed in parallel_schedule before sanity_check, so this failure needs another explanation: - QUERY PLAN -------------------------------------------------------- - Index Only Scan using tenk1_thous_tenthous on tenk1 - Index Cond: (thousand < 2) - Filter: (tenthous = ANY ('{1001,3000}'::integer[])) -(3 rows) + QUERY PLAN +-------------------------------------------------------------------------------------- + Sort + Sort Key: thousand + -> Index Only Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) +(4 rows) Best regards, Alexander
В списке pgsql-hackers по дате отправления: