RE: Data is copied twice when specifying both child and parent table in publication

Поиск
Список
Период
Сортировка
От shiy.fnst@fujitsu.com
Тема RE: Data is copied twice when specifying both child and parent table in publication
Дата
Msg-id OSZPR01MB6310829049C935E1F8D5344FFD8F9@OSZPR01MB6310.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Data is copied twice when specifying both child and parent table in publication  (Jacob Champion <jchampion@timescale.com>)
Ответы Re: Data is copied twice when specifying both child and parent table in publication  (Jacob Champion <jchampion@timescale.com>)
Список pgsql-hackers
On Fri, Mar 31, 2023 2:16 AM Jacob Champion <jchampion@timescale.com> wrote:
> 
> On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > Pushed.
> 
> While rebasing my logical-roots patch over the top of this, I ran into
> another situation where mixed viaroot settings can duplicate data. The
> key idea is to subscribe to two publications with mixed settings, as
> before, and add a partition root that's already been replicated with
> viaroot=false to the other publication with viaroot=true.
> 
>   pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
>   pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
>   pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
> (publish_via_partition_root);
>   -- populate with data, then switch to subscription side
>   sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all,
> pub_other;
>   -- switch back to publication
>   pub=# ALTER PUBLICATION pub_other ADD TABLE part;
>   -- and back to subscription
>   sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>   -- data is now duplicated
> 
> (Standalone reproduction attached.)
> 
> This is similar to what happens if you alter the
> publish_via_partition_root setting for an existing publication, but
> I'd argue it's easier to hit by accident. Is this part of the same
> class of bugs, or is it different (or even expected) behavior?
> 

I noticed that a similar problem has been discussed in this thread, see [1] [2]
[3] [4]. It seems complicated to fix it if we want to automatically skip tables
that have been synchronized previously by code, and this may overkill in some
cases (e.g. The target table in subscriber is not a partitioned table, and the
user want to synchronize all data in the partitioned table from the publisher).
Besides, it seems not a common case. So I'm not sure we should fix it. Maybe we
can just add some documentation for it as Peter mentioned.

[1] https://www.postgresql.org/message-id/CAJcOf-eQR_%3Dq0f4ZVHd342QdLvBd_995peSr4xCU05hrS3TeTg%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
(thesecond issue in it)
 
[3] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAA4eK1%2BNWreG%3D2sKiMz8vFzTsFhEHCjgQMyAu6zj3sdLmcheYg%40mail.gmail.com

Regards,
Shi Yu

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Minimal logical decoding on standbys
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: ICU locale validation / canonicalization