Re: failures in t/031_recovery_conflict.pl on CI
От | Andres Freund |
---|---|
Тема | Re: failures in t/031_recovery_conflict.pl on CI |
Дата | |
Msg-id | 20220429200809.uuuseakmhb57sppk@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: failures in t/031_recovery_conflict.pl on CI (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: failures in t/031_recovery_conflict.pl on CI
|
Список | pgsql-hackers |
Hi, Attached are patches for this issue. It adds a test case for deadlock conflicts to make sure that case isn't broken. I also tested the recovery conflict tests in the back branches, and they work there with a reasonably small set of changes. Questions: - I'm planning to backpatch the test as 031_recovery_conflict.pl, even though preceding numbers are unused. It seems way more problematic to use a different number in the backbranches than have gaps? - The test uses pump_until() and wait_for_log(), which don't exist in the backbranches. For now I've just inlined the implementation, but I guess we could also backpatch their introduction? - There's a few incompatibilities in the test with older branches: - older branches don't have allow_in_place_tablespaces - I think just skipping tablespace conflicts is fine, they're comparatively simple. Eventually it might make sense to backpatch allow_in_place_tablespaces, our test coverage in the area is quite poor. - the stats tests can't easily made reliably in the backbranches - which is ok, as the conflict itself is verified via the log - some branches don't have log_recovery_conflict_waits, since it's not critical to the test, it's ok to just not include it there I played with the idea of handling the differences using version comparisons in the code, and have the test be identically across branches. Since it's something we don't do so far, I'm leaning against it, but ... > - For HEAD we have to replace the disable_all_timeouts() calls, it breaks the > replay progress reporting. Is there a reason to keep them in the > backbranches? Hard to see how an extension or something could rely on it, > but ...? I've left it as is for now, will start a separate thread. > - There's the following comment in ResolveRecoveryConflictWithBufferPin(): > > "We assume that only UnpinBuffer() and the timeout requests established > above can wake us up here." > > That bogus afaict? There's plenty other things that can cause MyProc->latch > to be set. Is it worth doing something about this at the same time? Right > now we seem to call ResolveRecoveryConflictWithBufferPin() in rapid > succession initially. The comment is more recent than I had realized. I raised this separately in https://postgr.es/m/20220429191815.xewxjlpmq7mxhsr2%40alap3.anarazel.de pgindent uses some crazy formatting nearby: SendRecoveryConflictWithBufferPin( PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); I'm tempted to clean that up in passing by having just one SendRecoveryConflictWithBufferPin() call instead of two, storing the type of conflict in a local variable? Doesn't look entirely pretty either, but ... I'm very doubtful of this claim above ResolveRecoveryConflictWithBufferPin(), btw. But that'd be a non-backpatchable cleanup, I think: * The ProcWaitForSignal() sleep normally done in LockBufferForCleanup() * (when not InHotStandby) is performed here, for code clarity. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: