Re: pgsql: Attempt to fix unstable regression tests, take 2
От | David Rowley |
---|---|
Тема | Re: pgsql: Attempt to fix unstable regression tests, take 2 |
Дата | |
Msg-id | CAHoyFK9pHKPHyEp35QXo9NzkFOeupyRNONuEFgej4U54=Cmj2w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pgsql: Attempt to fix unstable regression tests, take 2 (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: pgsql: Attempt to fix unstable regression tests, take 2
|
Список | pgsql-committers |
On Tue, 31 Mar 2020 at 15:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've been trying to reproduce this by dint of running just the stats_ext > script, over and over in a loop. I've not had any success on fast > machines, but on a slow one (florican's host) I got this after a few > hundred iterations: I've had a 13 year old laptop running just stats_ext in a loop for about an hour now. I managed to get 1000 runs without any failure. Trying again with autovacuum_naptime set to 1s... 1000 runs, and nothing yet. If you disable autovacuum on the problem table, can you still reproduce the failure on that machine? > Now this *IS* autovacuum interference, but it's hardly autovacuum's fault: > the test script is supposing that autovac won't come in before it does a > manual analyze, and that's just unsafe on its face. Why would that matter? The manual operation will just overwrite what autovacuum did. Obviously, there can't be any overlap due to the ShareUpdateExclusiveLock. My suspicion was that autovacuum ran a vacuum *after* the VACUUM (ANALYZE). I've not studied the code, but I've had thoughts that the manual operation might have slotted in just between when autovacuum checked what work there was to do and when it actually did the work. Unsure how likely that is given that we have table_recheck_autovac(). > I'm thinking that what we ought to do is have this test disable autovac > altogether on its tables, ie > CREATE TABLE ... WITH (autovacuum_enabled = off); > > However, I remain suspicious that there's something else going on, > unrelated to autovac. All the buildfarm cases so far have been > small underestimates, one or two rows, so they look entirely different > from the example above. Even if autovacuum is firing unexpectedly, > how would it cause such results? Perhaps we can remain suspicious if we still see failures after fixing it to disable autovacuum on these tables. It seems to happen often enough that if we don't see it again in a week, then we might be able to assume that was the issue. David
В списке pgsql-committers по дате отправления: