Re: failures in t/031_recovery_conflict.pl on CI

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: failures in t/031_recovery_conflict.pl on CI
Дата
Msg-id 20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: failures in t/031_recovery_conflict.pl on CI  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: failures in t/031_recovery_conflict.pl on CI  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2022-04-12 15:05:22 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
> >> +1.  This is probably more feasible given the latch infrastructure
> >> than it was when that code was first written.
> 
> > What do you think about just reordering the disable_all_timeouts() to be
> > before the got_standby_deadlock_timeout check in the back branches? I think
> > that should close at least the most obvious hole.  And fix it properly in
> > HEAD?
> 
> I don't have much faith in that, and I don't see why we can't fix it
> properly.

It's not too hard, agreed.

It's somewhat hard to test that it works though. The new recovery conflict
tests test all recovery conflicts except for deadlock ones, because they're
not easy to trigger... But I think I nearly got it working reliably.

It's probably worth backpatching the tests, after stripping them of the stats
checks?

Three questions:

- For HEAD we have to replace the disable_all_timeouts() calls, it breaks the
  replay progress reporting. Is there a reason to keep them in the
  backbranches? Hard to see how an extension or something could rely on it,
  but ...?

- I named the variable set by the signal handler got_standby_delay_timeout,
  because just got_standby_timeout looked weird besides the _deadlock_. Which
  makes me think that we should rename STANDBY_TIMEOUT to
  STANDBY_DELAY_TIMEOUT too?

- There's the following comment in ResolveRecoveryConflictWithBufferPin():

  "We assume that only UnpinBuffer() and the timeout requests established
   above can wake us up here."

  That bogus afaict? There's plenty other things that can cause MyProc->latch
  to be set. Is it worth doing something about this at the same time? Right
  now we seem to call ResolveRecoveryConflictWithBufferPin() in rapid
  succession initially.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Improving the "Routine Vacuuming" docs
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Improving the "Routine Vacuuming" docs