Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
От | Dilip Kumar |
---|---|
Тема | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Дата | |
Msg-id | CAFiTN-tN3ya3PEnqZVLDWN=v68bRriPks_6zkVZrC-vw8QjAcg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Wed, Jul 6, 2022 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 6, 2022 at 1:47 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 6, 2022 at 9:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > How would you choose the slot name for the table sync, right now it > > > contains the relid of the table for which it needs to perform sync? > > > Say, if we ignore to include the appropriate identifier in the slot > > > name, we won't be able to resue/drop the slot after restart of table > > > sync worker due to an error. > > > > I had a quick look into the patch and it seems it is using the worker > > array index instead of relid while forming the slot name, and I think > > that make sense, because now whichever worker is using that worker > > index can reuse the slot created w.r.t that index. > > > > I think that won't work because each time on restart the slot won't be > fixed. Now, it is possible that we may drop the wrong slot if that > state of copying rel is SUBREL_STATE_DATASYNC. So it will drop the previous slot the worker at that index was using, so it is possible that on that slot some relation was at SUBREL_STATE_FINISHEDCOPY or so and we will drop that slot. Because now relid and replication slot association is not 1-1 so it would be wrong to drop based on the relstate which is picked by this worker. In short it makes sense what you have pointed out. Also, it is possible > that while creating a slot, we fail because the same name slot already > exists due to some other worker which has created that slot has been > restarted. Also, what about origin_name, won't that have similar > problems? Also, if the state is already SUBREL_STATE_FINISHEDCOPY, if > the slot is not the same as we have used in the previous run of a > particular worker, it may start WAL streaming from a different point > based on the slot's confirmed_flush_location. Yeah this is also true, when a tablesync worker has to do catch up after completing the copy then it might stream from the wrong lsn. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: