Re: Initial Schema Sync for Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Initial Schema Sync for Logical Replication
Дата
Msg-id CAA4eK1Ld9-5ueomE_J5CA6LfRo=wemdTrUp5qdBhRFwGT+dOUw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
RE: Initial Schema Sync for Logical Replication  ("Kumar, Sachin" <ssetiya@amazon.com>)
Список pgsql-hackers
On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin <ssetiya@amazon.com> wrote:
> >
> > > From: Amit Kapila <amit.kapila16@gmail.com>
> > > > I think we won't be able to use same snapshot because the transaction will
> > > > be committed.
> > > > In CreateSubscription() we can use the transaction snapshot from
> > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > > about this part maybe walrcv_disconnect() calls the commits internally ?).
> > > > So somehow we need to keep this snapshot alive, even after transaction
> > > > is committed(or delay committing the transaction , but we can have
> > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > > before tableSync is able to use the same snapshot.)
> > > >
> > >
> > > Can we think of getting the table data as well along with schema via
> > > pg_dump? Won't then both schema and initial data will correspond to the
> > > same snapshot?
> >
> > Right , that will work, Thanks!
>
> While it works, we cannot get the initial data in parallel, no?
>

Another possibility is that we dump/restore the schema of each table
along with its data. One thing we can explore is whether the parallel
option of dump can be useful here. Do you have any other ideas?

One related idea is that currently, we fetch the table list
corresponding to publications in subscription and create the entries
for those in pg_subscription_rel during Create Subscription, can we
think of postponing that work till after the initial schema sync? We
seem to be already storing publications list in pg_subscription, so it
appears possible if we somehow remember the value of copy_data. If
this is feasible then I think that may give us the flexibility to
perform the initial sync at a later point by the background worker.

> >
> > > > I think we can have same issues as you mentioned New table t1 is added
> > > > to the publication , User does a refresh publication.
> > > > pg_dump / pg_restore restores the table definition. But before
> > > > tableSync can start,  steps from 2 to 5 happen on the publisher.
> > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > > And table sync errors out
> > > > There can be one more issue , since we took the pg_dump without
> > > snapshot (wrt to replication slot).
> > > >
> > >
> > > To avoid both the problems mentioned for Refresh Publication, we can do
> > > one of the following: (a) create a new slot along with a snapshot for this
> > > operation and drop it afterward; or (b) using the existing slot, establish a
> > > new snapshot using a technique proposed in email [1].
> > >
> >
> > Thanks, I think option (b) will be perfect, since we don’t have to create a new slot.
>
> Regarding (b), does it mean that apply worker stops streaming,
> requests to create a snapshot, and then resumes the streaming?
>

Shouldn't this be done by the backend performing a REFRESH publication?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)
Следующее
От: Dave Page
Дата:
Сообщение: Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)