Re: Is Recovery actually paused?
От | Dilip Kumar |
---|---|
Тема | Re: Is Recovery actually paused? |
Дата | |
Msg-id | CAFiTN-v_q9nSLt+ww8-vaDpiuWr+HKc=EFkWAXHX-PS+z6XGkg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Is Recovery actually paused? (Dilip Kumar <dilipbalaut@gmail.com>) |
Список | pgsql-hackers |
On Sun, Jan 17, 2021 at 1:48 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > > > > > > > although this setting may not be usual. In addition, some users may set > > > > > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > Ok > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > > pg_is_wal_replay_paused? > > > > > > > > Okay > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > > > > > > > control whether this waits for recovery to get paused or not? By setting its > > > > > > > > default value to true or false, users can use the old format for calling this > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > > > immediately return true if the pause is requested? I agree that it is > > > > > > > good to have an API to know whether the recovery pause is requested or > > > > > > > not but I am not sure is it good idea to make this API serve both the > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > > > > this waits recovery to actually get paused. If we want to limit this API's > > > > > purpose only to return the pause state, it seems better to fix this to return > > > > > the actual state at the cost of lacking the backward compatibility. If we want > > > > > to know whether pause is requested, we may add a new API like > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually > > > > > get paused, we may add an option to pg_wal_replay_pause() for this purpose. > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. > > > > > > > > I don't think that it will be blocked ever, because > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > > > > > > > > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during > > > > > this is blocking. > > > > > > > > Yeah, we can do this. I will send the updated patch after putting > > > > some more thought into these comments. Thanks again for the feedback. > > > > > > > > > > Please find the updated patch. > > > > I've looked at the patch. Here are review comments: > > > > + /* Recovery pause state */ > > + RecoveryPauseState recoveryPause; > > > > Now that the value can have tri-state, how about renaming it to > > recoveryPauseState? > > This makes sense to me. > > > --- > > bool > > RecoveryIsPaused(void) > > +{ > > + bool recoveryPause; > > + > > + SpinLockAcquire(&XLogCtl->info_lck); > > + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? > > true : false; > > + SpinLockRelease(&XLogCtl->info_lck); > > + > > + return recoveryPause; > > +} > > + > > +bool > > +RecoveryPauseRequested(void) > > { > > bool recoveryPause; > > > > SpinLockAcquire(&XLogCtl->info_lck); > > - recoveryPause = XLogCtl->recoveryPause; > > + recoveryPause = (XLogCtl->recoveryPause != > > RECOVERY_IN_PROGRESS) ? true : false; > > SpinLockRelease(&XLogCtl->info_lck); > > > > return recoveryPause; > > } > > > > We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED); > > In RecoveryPauseRequested, we just want to know whether the pause is > requested or not, even if the pause requested and not yet pause then > also we want to return true. So how > recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) will work? > > > Also, since these functions do the almost same thing, I think we can > > have a common function to get XLogCtl->recoveryPause, say > > GetRecoveryPauseState() or GetRecoveryPause(), and both > > RecoveryIsPaused() and RecoveryPauseRequested() use the returned > > value. What do you think? > > Yeah we can do that. > > > --- > > +static void > > +CheckAndSetRecoveryPause(void) > > > > Maybe we need to declare the prototype of this function like other > > functions in xlog.c. > > Okay > > > --- > > + /* > > + * If recovery is not in progress anymore then report an error this > > + * could happen if the standby is promoted while we were waiting for > > + * recovery to get paused. > > + */ > > + if (!RecoveryInProgress()) > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("recovery is not in progress"), > > + errhint("Recovery control functions can only be > > executed during recovery."))); > > > > I think we can improve the error message so that we can tell users the > > standby has been promoted during the wait. For example, > > > > errmsg("the standby was promoted during waiting for > > recovery to be paused"))); > > > > --- > > + /* test for recovery pause if user has requested the pause */ > > + if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) > > + recoveryPausesHere(false); > > + > > + now = GetCurrentTimestamp(); > > + > > > > Hmm, if the recovery pauses here, the wal receiver isn't launched even > > when wal_retrieve_retry_interval has passed, which seems not good. I > > think we want the recovery to be paused but want the wal receiver to > > continue receiving WAL. > > > > And why do we need to set 'now' here? > > > > --- > > /* > > * Wait until shared recoveryPause flag is cleared. > > * > > * endOfRecovery is true if the recovery target is reached and > > * the paused state starts at the end of recovery because of > > * recovery_target_action=pause, and false otherwise. > > * > > * XXX Could also be done with shared latch, avoiding the pg_usleep loop. > > * Probably not worth the trouble though. This state shouldn't be one that > > * anyone cares about server power consumption in. > > */ > > static void > > recoveryPausesHere(bool endOfRecovery) > > > > We can improve the first sentence in the above function comment to > > "Wait until shared recoveryPause is set to RECOVERY_IN_PROGRESS" or > > something. > > > > --- > > - PG_RETURN_BOOL(RecoveryIsPaused()); > > + if (!RecoveryPauseRequested()) > > + PG_RETURN_BOOL(false); > > + > > + /* loop until the recovery is actually paused */ > > + while(!RecoveryIsPaused()) > > + { > > + pg_usleep(10000L); /* wait for 10 msec */ > > + > > + /* meanwhile if recovery is resume requested then return false */ > > + if (!RecoveryPauseRequested()) > > + PG_RETURN_BOOL(false); > > + > > + CHECK_FOR_INTERRUPTS(); > > + > > + /* > > + * If recovery is not in progress anymore then report an error this > > + * could happen if the standby is promoted while we were waiting for > > + * recovery to get paused. > > + */ > > + if (!RecoveryInProgress()) > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("recovery is not in progress"), > > + errhint("Recovery control functions can only be > > executed during recovery."))); > > + } > > + > > + PG_RETURN_BOOL(true); > > > > We have the same !RecoveryPauseRequested() check twice, how about the > > following arrangement? > > > > for (;;) > > { > > if (!RecoveryPauseRequested()) > > PG_RETURN_BOOL(false); > > > > if (RecoveryIsPaused()) > > break; > > > > pg_usleep(10000L); > > > > CHECK_FOR_INTERRUPTS(); > > > > if (!RecoveryInProgress()) > > ereport(...); > > } > > > > PG_RETURN_BOOL(true); > > > > Regards, > > > > Okay, we can do that. I will make these changes in the next patch. > I have fixed the above agreed comments. Please have a look. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Masahiko SawadaДата:
Сообщение: Re: Transactions involving multiple postgres foreign servers, take 2
Следующее
От: "Hou, Zhijie"Дата:
Сообщение: RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit