Обсуждение: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option
Hi. (For brevity, in this mail I refer to "ALTER SUBSCRIPTION sub REFRESH PUBLICATION" as "ASRP") -- The PG Docs for ASRP WITH (copy_data = true), says "(Previously subscribed tables are not copied.)" [1]. I thought this rule meant that only tables which got added by ALTER PUBLICATION pubname ADD TABLE ... [2] after the CREATE SUBSCRIPTION would be affected by the copy_data. But I recently learned that when there are partitions in the publication, then toggling the value of the PUBLICATION option "publish_via_partition_root" [3] can also *implicitly* change the list published tables, and therefore that too might cause any ASRP to make use of the copy_data value for those implicitly added partitions/tables. It seems a bit too subtle. I was wondering if this should be made more obvious by a note added to the PG Docs for the ASRP [1]. e.g. "Previously subscribed tables are not copied. Note: Tables may also be newly subscribed by changes to the publish_via_partition_root option [link]" Or perhaps, the "publish_via_partition_root option" Docs [3] should say something. e.g. "Note: Changing this option can affect the ASRP copy_data [link]. Thoughts? ----- [1] https://www.postgresql.org/docs/devel/sql-altersubscription.html [2] https://www.postgresql.org/docs/devel/sql-alterpublication.html [3] https://www.postgresql.org/docs/devel/sql-createpublication.html Kind Regards, Peter Smith. Fujitsu Australia
(One month has passed since my original post but there have been no replies to it). It seems like the original post maybe just got buried with too many other mails so I am "bumping" this thread to elicit some response for/against the suggestion. ------ Kind Regards, Peter Smith. Fujitsu Australia.
On Fri, Jun 25, 2021 at 9:20 AM Peter Smith <smithpb2250@gmail.com> wrote: > > But I recently learned that when there are partitions in the > publication, then toggling the value of the PUBLICATION option > "publish_via_partition_root" [3] can also *implicitly* change the list > published tables, and therefore that too might cause any ASRP to make > use of the copy_data value for those implicitly added > partitions/tables. > I have tried the below example in this context but didn't see any effect on changing via_root option. Set up on both publisher and subscriber: ================================= CREATE TABLE tab2 (a int PRIMARY KEY, b text) PARTITION BY LIST (a); CREATE TABLE tab2_1 (b text, a int NOT NULL); ALTER TABLE tab2 ATTACH PARTITION tab2_1 FOR VALUES IN (0, 1, 2, 3); CREATE TABLE tab2_2 PARTITION OF tab2 FOR VALUES IN (5, 6); Publisher: ========== CREATE PUBLICATION pub_viaroot FOR TABLE tab2_2; postgres=# INSERT INTO tab2 VALUES (1), (0), (3), (5); INSERT 0 4 postgres=# select * from tab2_1; b | a ---+--- | 1 | 0 | 3 (3 rows) postgres=# select * from tab2_2; a | b ---+--- 5 | (1 row) Subscriber: ========== CREATE SUBSCRIPTION sub_viaroot CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub_viaroot; postgres=# select * from tab2_2; a | b ---+--- 5 | (1 row) postgres=# select * from tab2_1; b | a ---+--- (0 rows) So, by this step, we can see the partition which is not subscribed is not copied. Now, let's toggle via_root option. Publisher ========= Alter Publication pub_viaroot Set (publish_via_partition_root = true); Subscriber ========== postgres=# Alter Subscription sub_viaroot Refresh Publication; ALTER SUBSCRIPTION postgres=# select * from tab2_2; a | b ---+--- 5 | (1 row) postgres=# select * from tab2_1; b | a ---+--- (0 rows) As per your explanation, one can expect the data in tab2_1 in the last step. Can you explain with example? -- With Regards, Amit Kapila.
On Tue, Sep 14, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 9:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > But I recently learned that when there are partitions in the
> > publication, then toggling the value of the PUBLICATION option
> > "publish_via_partition_root" [3] can also *implicitly* change the list
> > published tables, and therefore that too might cause any ASRP to make
> > use of the copy_data value for those implicitly added
> > partitions/tables.
> >
>
> I have tried the below example in this context but didn't see any
> effect on changing via_root option.
Thanks for trying to reproduce. I also thought your steps were the
same as what I'd previously done but it seems like it was a bit
different. Below are my steps to observe some unexpected COPY
happening. Actually, now I am no longer sure if this is just a
documentation issue; perhaps it is a bug.
STEP 1 - create partition tables on both sides
===================================
[PUB and SUB]
postgres=# create table troot (a int) partition by range(a);
CREATE TABLE
postgres=# create table tless10 partition of troot for values from (1) to (9);
CREATE TABLE
postgres=# create table tmore10 partition of troot for values from (10) to (99);
CREATE TABLE
STEP 2 - insert some data on pub-side
==============================
[PUB]
postgres=# insert into troot values (1),(2),(3);
INSERT 0 3
postgres=# insert into troot values (11),(12),(13);
INSERT 0 3
postgres=# select * from troot;
a
----
1
2
3
11
12
13
(6 rows)
STEP 3 - create a publication on the partition root
======================================
[PUB]
postgres=# CREATE PUBLICATION pub1 FOR TABLE troot;
CREATE PUBLICATION
postgres=# \dRp+ pub1;
Publication pub1
Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
postgres | f | t | t | t | t | f
Tables:
"public.troot"
STEP 4 - create the subscriber
=======================
[SUB]
postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=127.0.0.1
port=5432 dbname=postgres' PUBLICATION pub1;
NOTICE: created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
postgres=# 2021-09-15 12:45:12.224 AEST [30592] LOG: logical
replication apply worker for subscription "sub1" has started
2021-09-15 12:45:12.236 AEST [30595] LOG: logical replication table
synchronization worker for subscription "sub1", table "tless10" has
started
2021-09-15 12:45:12.247 AEST [30598] LOG: logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
started
2021-09-15 12:45:12.326 AEST [30595] LOG: logical replication table
synchronization worker for subscription "sub1", table "tless10" has
finished
2021-09-15 12:45:12.332 AEST [30598] LOG: logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
finished
postgres=# select * from troot;
a
----
1
2
3
11
12
13
(6 rows)
// To this point, everything looks OK...
STEP 5 - toggle the publish_via_partition_root flag
======================================
[PUB]
postgres=# alter publication pub1 set (publish_via_partition_root = true);
ALTER PUBLICATION
postgres=# \dRp+ pub1;
Publication pub1
Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
postgres | f | t | t | t | t | t
Tables:
"public.troot"
// And then refresh the subscriber
[SUB]
postgres=# alter subscription sub1 refresh PUBLICATION;
ALTER SUBSCRIPTION
postgres=# 2021-09-15 12:48:37.927 AEST [3861] LOG: logical
replication table synchronization worker for subscription "sub1",
table "troot" has started
2021-09-15 12:48:37.977 AEST [3861] LOG: logical replication table
synchronization worker for subscription "sub1", table "troot" has
finished
// Notice above that another tablesync worker has launched and copied
everything again - BUG??
[SUB]
postgres=# select * from troot;
a
----
1
2
3
1
2
3
11
12
13
11
12
13
(12 rows)
// At this point if I would keep toggling the
publish_via_partition_root then each time I do subscription REFRESH
PUBLICATION it will copy the data yet again. For example,
[PUB]
postgres=# alter publication pub1 set (publish_via_partition_root = false);
ALTER PUBLICATION
[SUB]
postgres=# alter subscription sub1 refresh PUBLICATION;
ALTER SUBSCRIPTION
postgres=# 2021-09-15 12:59:02.106 AEST [21709] LOG: logical
replication table synchronization worker for subscription "sub1",
table "tless10" has started
2021-09-15 12:59:02.120 AEST [21711] LOG: logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
started
2021-09-15 12:59:02.189 AEST [21709] LOG: logical replication table
synchronization worker for subscription "sub1", table "tless10" has
finished
2021-09-15 12:59:02.207 AEST [21711] LOG: logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
finished
By now the pub/sub data on each side is quite different
==========================================
[PUB]
postgres=# select count(*) from troot;
count
-------
6
(1 row)
[SUB]
postgres=# select count(*) from troot;
count
-------
18
(1 row)
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Wed, Sep 15, 2021 at 8:49 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Tue, Sep 14, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jun 25, 2021 at 9:20 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > But I recently learned that when there are partitions in the > > > publication, then toggling the value of the PUBLICATION option > > > "publish_via_partition_root" [3] can also *implicitly* change the list > > > published tables, and therefore that too might cause any ASRP to make > > > use of the copy_data value for those implicitly added > > > partitions/tables. > > > > > > > I have tried the below example in this context but didn't see any > > effect on changing via_root option. > > Thanks for trying to reproduce. I also thought your steps were the > same as what I'd previously done but it seems like it was a bit > different. Below are my steps to observe some unexpected COPY > happening. Actually, now I am no longer sure if this is just a > documentation issue; perhaps it is a bug. > Yeah, this looks odd to me as well. -- With Regards, Amit Kapila.