Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
От | Thomas Munro |
---|---|
Тема | Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? |
Дата | |
Msg-id | CA+hUKGL7ZFiX5yrbTRSjwH_x=2m40cobGewxu+XBKu0Dbh5N-Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
|
Список | pgsql-hackers |
On Sun, Apr 10, 2022 at 11:00 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-04-09 14:39:16 -0700, Andres Freund wrote: > > On 2022-04-09 17:00:41 -0400, Tom Lane wrote: > > > Thomas Munro <thomas.munro@gmail.com> writes: > > > > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt() > > > > doesn't just set a sig_atomic_t flag and poke the latch. Is the extra > > > > stuff it does safe? For example, is this call stack OK (to pick one > > > > that jumps out, but not the only one)? > > > > > > > procsignal_sigusr1_handler > > > > -> RecoveryConflictInterrupt > > > > -> HoldingBufferPinThatDelaysRecovery > > > > -> GetPrivateRefCount > > > > -> GetPrivateRefCountEntry > > > > -> hash_search(...hash table that might be in the middle of an update...) > > > > > > Ugh. That one was safe before somebody decided we needed a hash table > > > for buffer refcounts, but it's surely not safe now. > > > > Mea culpa. This is 4b4b680c3d6d - from 2014. > > Whoa. There's way worse: StandbyTimeoutHandler() calls > SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which > acquires lwlocks etc. > > Which very plausibly is the cause for the issue I'm investigating in > https://www.postgresql.org/message-id/20220409220054.fqn5arvbeesmxdg5%40alap3.anarazel.de Huh. I wouldn't have started a separate thread for this if I'd realised I was getting close to the cause of the CI failure... I thought this was an incidental observation. Anyway, I made a first attempt at fixing this SIGUSR1 problem (I think Andres is looking at the SIGALRM problem in the other thread). Instead of bothering to create N different XXXPending variables for the different conflict "reasons", I used an array. Other than that, it's much like existing examples. The existing use of the global variable RecoveryConflictReason seems a little woolly. Doesn't it get clobbered every time a signal arrives, even if we determine that there is no conflict? Not sure why that's OK, but anyway, this patch always sets it together with RecoveryConflictPending = true.
Вложения
В списке pgsql-hackers по дате отправления: