RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB57160EB0B56BC6F328F7ED62944D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thursday, February 15, 2024 10:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Wed, Feb 14, 2024 at 10:40:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > On Wednesday, February 14, 2024 6:05 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > > >
> > > > To ensure that restart_lsn has been moved to a recent position, we
> > > > need to log XLOG_RUNNING_XACTS and make sure the same is processed
> > > > as well by walsender. The attached patch does the required change.
> > > >
> > > > Hou-San can reproduce this problem by adding additional
> > > > checkpoints in the test and after applying the attached it fixes
> > > > the problem. Now, this patch is mostly based on the theory we
> > > > formed based on LOGs on BF and a reproducer by Hou-San, so still,
> > > > there is some chance that this doesn't fix the BF failures in which case I'll
> again look into those.
> > >
> > > I have verified that the patch can fix the issue on my machine(after
> > > adding few more checkpoints before slot invalidation test.) I also
> > > added one more check in the test to confirm the synced slot is not temp slot.
> Here is the v2 patch.
> >
> > Thanks!
> >
> > +# To ensure that restart_lsn has moved to a recent WAL position, we
> > +need # to log XLOG_RUNNING_XACTS and make sure the same is processed
> > +as well $primary->psql('postgres', "CHECKPOINT");
> >
> > Instead of "CHECKPOINT" wouldn't a less heavy "SELECT
> pg_log_standby_snapshot();"
> > be enough?
> >
> 
> Yeah, that would be enough. However, the test still fails randomly due to the
> same reason. See [1]. So, as mentioned yesterday, now, I feel it is better to
> recreate the subscription/slot so that it can get the latest restart_lsn rather than
> relying on pg_log_standby_snapshot() to move it.
> 
> > Not a big deal but maybe we could do the change while modifying
> > 040_standby_failover_slots_sync.pl in the next patch "Add a new slotsync
> worker".
> >
> 
> Right, we can do that or probably this test would have made more sense with a
> worker patch where we could wait for the slot to be synced.
> Anyway, let's try to recreate the slot/subscription idea. BTW, do you think that
> adding a LOG when we are not able to sync will help in debugging such
> problems? I think eventually we can change it to DEBUG1 but for now, it can help
> with stabilizing BF and or some other reported issues.

Here is the patch that attempts the re-create sub idea. I also think that a LOG/DEBUG
would be useful for such analysis, so the 0002 is to add such a log.

Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum
Следующее
От: shveta malik
Дата:
Сообщение: Re: About a recently-added message