Re: Synchronizing slots from primary to standby
От | shveta malik |
---|---|
Тема | Re: Synchronizing slots from primary to standby |
Дата | |
Msg-id | CAJpy0uBY1x_mjqUk6dyD3iGtihwboy5mnrnL4tzZxTD3vy7X4A@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Ответы |
Re: Synchronizing slots from primary to standby
(Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Dec 22, 2023 at 3:11 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Thursday, December 21, 2023 5:39 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > > On Thu, Dec 21, 2023 at 02:23:12AM +0000, Zhijie Hou (Fujitsu) wrote: > > > On Wednesday, December 20, 2023 8:42 PM Zhijie Hou (Fujitsu) > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > Attach the V51 patch set which addressed Kuroda-san's comments. > > > > I also tried to improve the test in 0003 to make it stable. > > > > > > The patches conflict with a recent commit dc21234. > > > Here is the rebased V51_2 version, there is no code changes in this version. > > > > > > > Thanks! > > > > I've a few remarks regarding 0001: > > Thanks for the comments! > > > > > 1 === > > > > In the commit message what about replacing "Allow logical walsenders to wait > > for the physical standbys" with "Force some logical walsenders to wait for the > > physical standbys"? > > I feel 'Allow' is OK, as the GUC standby_slot_names is optional for user. ISTM, 'force' > means we always wait for physical standbys regardless of the GUC. > > > > > Also I think it would be better to first explain what we are trying to achieve and > > after explain how we do it (adding a new flag in CREATE SUBSCRIPTION and so > > on). > > Noted. We are about to split the patches, so will improve each commit message after that. > > > > > 4 === > > > > @@ -248,10 +262,13 @@ ReplicationSlotValidateName(const char *name, int > > elevel) > > * during getting changes, if the two_phase option is enabled it can skip > > * prepare because by that time start decoding point has been moved. So > > the > > * user will only get commit prepared. > > + * failover: If enabled, allows the slot to be synced to physical standbys so > > + * that logical replication can be resumed after failover. > > > > s/allows/forces ? > > I think whether the slot is synced also depends on the > GUC setting on standby, so I feel 'allow' is fine here. > > > > > 5 === > > > > + bool ok; > > > > parse_ok maybe? > > The flag is also used to store the slot type check result, so I feel 'ok' is > better here. > > > > > 6 === > > > > + /* Need a modifiable copy of string. */ > > + rawname = pstrdup(*newval); > > > > It seems to me that the single line comments in the neighborhood functions > > (see > > RestoreSlotFromDisk() for example) don't finish with ".". Worth to follow the > > same format for all what we add in slot.c? > > I felt we have both styles in slot.c, but it seems Kuroda-san also > prefer removing the ".", so will address. > > > > > 7 === > > > > +static void > > +parseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover) > > > > ParseAlterReplSlotOptions instead? > > I think it followed parseCreateReplSlotOptions, but I agree that it looks > inconsistent with other names. Will address. > > > 11 === > > > > + * When the wait event is WAIT_FOR_STANDBY_CONFIRMATION, wait on > > another > > + * CV that is woken up by physical walsenders when the walreceiver has > > + * confirmed the receipt of LSN. > > > > s/that is woken up by/that is broadcasted by/ ? > > Will reword the comment here. > > > > > 12 === > > > > We are mentioning in several places that the replication can be resumed after a > > failover. Should we add a few words about possible lag? (see [1]) > > > > [1]: > > https://www.postgresql.org/message-id/CAA4eK1KihniOK21mEVYtSOHRQiG > > NyToUmENWp7hPbH_PMsqzkA%40mail.gmail.com > > It feels like the implementation detail to me, but noted. We will think more > about the document. > > > The comments not mentioned above look good to me. > > Best Regards, > Hou zj PFA v53. Changes are: patch001: 1) Addressed comments in [1] for v51-001. Thanks Hou-san for working on this. patch002: 2) Addressed comments in [2] for v52-002. 3) Fixed CFBot failure. The failure was caused by an assert in wait_for_primary_slot_catchup() for null confirmed_lsn received. In wait_for_primary_slot_catchup(), we had an assumption that if restart_lsn is valid and 'conflicting' is also false, then we must have non-null confirmed_lsn. But this is not true. It is possible to get null values for confirmed_lsn and catalog_xmin if on the primary server the slot is just created with a valid restart_lsn and slot-sync worker has fetched the slot before the primary server could set valid confirmed_lsn and catalog_xmin. In pg_create_logical_replication_slot(), there is a small window between CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets restart_lsn and DecodingContextFindStartpoint() which sets confirmed_lsn. If the slot-sync worker fetches the slot in this window, confirmed_lsn received will be NULL. Corrected the code to remove assert and added one additional condition that confirmed_lsn should be valid before moving the slot to 'r'. [1]: https://www.postgresql.org/message-id/ZYQHvgBpH0GgQaJK%40ip-10-97-1-34.eu-west-3.compute.internal [2]: https://www.postgresql.org/message-id/TY3PR01MB98893274D5A4FD4F86CC04A0F595A%40TY3PR01MB9889.jpnprd01.prod.outlook.com thanks Shveta
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Alexander KorotkovДата:
Сообщение: Re: Optimization outcome depends on the index order