Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)
От | Thomas Munro |
---|---|
Тема | Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds) |
Дата | |
Msg-id | CA+hUKGJHtNs_p4X+KBy9ZxYAY_aibthT4q1vWgVqE4o9+6fPhA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds) (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Ответы |
Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)
|
Список | pgsql-hackers |
On Sat, Sep 19, 2020 at 6:07 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > - pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE); > - pg_usleep(1000000L); /* 1000 ms */ > - pgstat_report_wait_end(); > + WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 1000, > + WAIT_EVENT_RECOVERY_PAUSE); > > This change may cause at most one second delay against the standby > promotion request during WAL replay pause? It's only one second, > but I'd like to avoid this (unnecessary) wait to shorten the failover time > as much as possible basically. So what about using WL_SET_LATCH here? Right, there is a comment saying that we could do that: * XXX Could also be done with shared latch, avoiding the pg_usleep loop. * Probably not worth the trouble though. This state shouldn't be one that * anyone cares about server power consumption in. > But when using WL_SET_LATCH, one concern is that walreceiver can > wake up the startup process too frequently even during WAL replay pause. > This is also problematic? I'm ok with this, but if not, using pg_usleep() > might be better as the original code does. You're right, at least if we used recoveryWakeupLatch. Although we'd react to pg_wal_replay_resume() faster, which would be nice, we wouldn't be saving energy, we'd be using more energy due to all the other latch wakeups that we'd be ignoring. I believe the correct solution to this problem is to add a ConditionVariable "recoveryPauseChanged" into XLogCtlData, and then broadcast on it in SetRecoveryPause(). This would be a trivial change, except for one small problem: ConditionVariableTimedSleep() contains CHECK_FOR_INTERRUPTS(), but startup.c has its own special interrupt handling rather than using ProcessInterrupts() from postgres.c. Maybe that's OK, I'm not sure, but it requires more thought, and I propose to keep the existing sloppy polling for now and leave precise wakeup improvements for a separate patch. The primary goal of this patch is to switch to the standard treatment of postmaster death in wait loops, so that we're free to reduce the sampling frequency in HandleStartupProcInterrupts(), to fix a horrible performance problem. I have at least tweaked that comment about pg_usleep(), though, because that was out of date; I also used (void) WaitLatch(...) to make it look like other places where we ignore the return value (perhaps some static analyser out there somewhere cares?) By the way, a CV could probably be used for walreceiver state changes too, to improve ShutdownWalRcv(). Although I know from CI that this builds and passes "make check" on Windows, I'm hoping to attract some review of the 0001 patch from a Windows person, and confirmation that it passes "check-world" (or at least src/test/recovery check) there, because I don't have CI scripts for that yet.
Вложения
В списке pgsql-hackers по дате отправления: