On 2020/11/04 20:20, Fujii Masao wrote:
>
>
> On 2020/11/04 19:27, Heikki Linnakangas wrote:
>> On 04/11/2020 09:44, Fujii Masao wrote:
>>> Get rid of the dedicated latch for signaling the startup process.
>>>
>>> This commit gets rid of the dedicated latch for signaling the startup
>>> process in favor of using its procLatch, since that comports better
>>> with possible generic signal handlers using that latch.
>>>
>>> Commit 1e53fe0e70 changed background processes so that they use standard
>>> SIGHUP handler. Like that, this commit also makes the startup process use
>>> standard SIGHUP handler to simplify the code.
>>
>> This seems to have made buildfarm member 'elver' to segfault. I've got a hunch that setting recoveryWakeupLatch to
NULL,when WakeupRecovery() doesn't check for NULL, is not OK. It's surprising that we're only seeing this on 'elver',
though.
>
> Thanks for the report! The latch is reset to NULL after ShutdownWalRcv(). So I thought that there are no processes
settingthe latch (i.e., walreceiver has already exited) after it's reset to NULL. But this is not true. There seem a
windowbetween ShutdownWalRcv() and the actual exit of walreceiver. If a signal is sent during that window, the
segmentationfault would happen. This would be the reason why segv happened in some platforms, but not in others.
>
> I'm thinking to remove the following code to fix this issue. Thought?
>
> /*
> * We don't need the latch anymore. It's not strictly necessary to reset
> * it to NULL, but let's do it for the sake of tidiness.
> */
> if (ArchiveRecoveryRequested)
> XLogCtl->recoveryWakeupLatch = NULL;
Or ISTM that WakeupRecovery() should set the latch only when the latch
has not been reset to NULL yet.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION