Re: Excessive number of replication slots for 12->14 logical replication

Поиск

Список

Период

Сортировка

От	Masahiko Sawada
Тема	Re: Excessive number of replication slots for 12->14 logical replication
Дата	9 сентября 2022 г. 21:48:54
Msg-id	CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj=sLUOn4Gd2GjUAKG-fw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Excessive number of replication slots for 12->14 logical replication (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	RE: Excessive number of replication slots for 12->14 logical replication Re: Excessive number of replication slots for 12->14 logical replication
Список	pgsql-bugs

Дерево обсуждения

On Tue, Aug 30, 2022 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 26, 2022 at 7:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Thanks for the testing. I'll push this sometime early next week (by
> > Tuesday) unless Sawada-San or someone else has any comments on it.
> >
>
> Pushed.

Tom reported buildfarm failures[1] and I've investigated the cause and
concluded this commit is relevant.

In process_syncing_tables_for_sync(), we have the following code:

        UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
                                   MyLogicalRepWorker->relid,
                                   MyLogicalRepWorker->relstate,
                                   MyLogicalRepWorker->relstate_lsn);

        ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
                                          MyLogicalRepWorker->relid,
                                          originname,
                                          sizeof(originname));
        replorigin_session_reset();
        replorigin_session_origin = InvalidRepOriginId;
        replorigin_session_origin_lsn = InvalidXLogRecPtr;
        replorigin_session_origin_timestamp = 0;

        /*
         * We expect that origin must be present. The concurrent operations
         * that remove origin like a refresh for the subscription take an
         * access exclusive lock on pg_subscription which prevent the previou
         * operation to update the rel state to SUBREL_STATE_SYNCDONE to
         * succeed.
         */
        replorigin_drop_by_name(originname, false, false);

        /*
         * End streaming so that LogRepWorkerWalRcvConn can be used to drop
         * the slot.
         */
        walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli);

        /*
         * Cleanup the tablesync slot.
         *
         * This has to be done after the data changes because otherwise if
         * there is an error while doing the database operations we won't be
         * able to rollback dropped slot.
         */
        ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
                                        MyLogicalRepWorker->relid,
                                        syncslotname,
                                        sizeof(syncslotname));

If the table sync worker errored at walrcv_endstreaming(), we assumed
that both dropping the replication origin and updating relstate are
rolled back, which however was wrong. Indeed, the replication origin
is not dropped but the in-memory state is reset. Therefore, after the
tablesync worker restarts, it starts logical replication with starting
point 0/0. Consequently, it  ends up applying the transaction that has
already been applied.

Regards,

[1] https://www.postgresql.org/message-id/115136.1662733870%40sss.pgh.pa.us

--
Masahiko Sawada

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Excessive number of replication slots for 12->14 logical replication