Re: Data is copied twice when specifying both child and parent table in publication

Поиск

Список

Период

Сортировка

От	Peter Smith
Тема	Re: Data is copied twice when specifying both child and parent table in publication
Дата	22 марта 2023 г. 04:50:07
Msg-id	CAHut+Ptk6oANo2-mw5191aU0HHP9YKjYhWgTZYU0g0Y1Co_5SQ@mail.gmail.com обсуждение исходный текст
Ответ на	RE: Data is copied twice when specifying both child and parent table in publication ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Ответы	RE: Data is copied twice when specifying both child and parent table in publication
Список	pgsql-hackers

Дерево обсуждения

Here are some review comments for patch code of HEAD_v19-0001

======
doc/src/sgml/ref/create_publication.sgml

1.
+         <para>
+          There can be a case where a subscription combines multiple
+          publications. If a root partitioned table is published by any
+          subscribed publications which set
+          <literal>publish_via_partition_root</literal> = true, changes on this
+          root partitioned table (or on its partitions) will be published using
+          the identity and schema of this root partitioned table rather than
+          that of the individual partitions.
+         </para>

1a.
The paragraph prior to this one just refers to "partitioned tables"
instead of "root partitioned table", so IMO we should continue with
the same terminology.

I also modified the remaining text slightly. AFAIK my suggestion
conveys exactly the same information but is shorter.

SUGGESTION
There can be a case where one subscription combines multiple
publications. If any of those publications has set
<literal>publish_via_partition_root</literal> = true, then changes in
a partitioned table (or on its partitions) will be published using the
identity and schema of the partitioned table.

~

1b.
Shouldn't that paragraph (or possibly somewhere in the CREATE
SUBSCRIPTION notes?) also explain that in this scenario the logical
replication will only publish one set of changes? After all, this is
the whole point of the patch, but I am not sure if the user will know
of this behaviour from the current documentation.

======
src/backend/catalog/pg_publication.c

2. filter_partitions

BEFORE:
static void
filter_partitions(List *table_infos)
{
ListCell   *lc;

foreach(lc, table_infos)
{
bool skip = false;
List    *ancestors = NIL;
ListCell    *lc2;
published_rel    *table_info = (published_rel *) lfirst(lc);

if (get_rel_relispartition(table_info->relid))
ancestors = get_partition_ancestors(table_info->relid);

foreach(lc2, ancestors)
{
Oid ancestor = lfirst_oid(lc2);

/* Is ancestor exists in the published table list? */
if (is_ancestor_member_tableinfos(ancestor, table_infos))
{
skip = true;
break;
}
}

if (skip)
table_infos = foreach_delete_current(table_infos, lc);
}
}

~

2a.
My previous review [1] (see #1) suggested putting most code within the
condition. AFAICT my comment still is applicable but was not yet
addressed.

2b.
IMO the comment "/* Is ancestor exists in the published table list?
*/" is unnecessary because it is already clear what is the purpose of
the function named "is_ancestor_member_tableinfos".


SUGGESTION
static void
filter_partitions(List *table_infos)
{
    ListCell   *lc;

    foreach(lc, table_infos)
    {
        if (get_rel_relispartition(table_info->relid))
        {
            bool skip = false;
            ListCell    *lc2;
            published_rel    *table_info = (published_rel *) lfirst(lc);
            List *ancestors = get_partition_ancestors(table_info->relid);

            foreach(lc2, ancestors)
            {
                Oid ancestor = lfirst_oid(lc2);

                if (is_ancestor_member_tableinfos(ancestor, table_infos))
                {
                    skip = true;
                    break;
                }
            }

            if (skip)
                table_infos = foreach_delete_current(table_infos, lc);
        }
    }
}

======
src/backend/commands/subscriptioncmds.c

3.
 fetch_table_list(WalReceiverConn *wrconn, List *publications)
 {
  WalRcvExecResult *res;
- StringInfoData cmd;
+ StringInfoData cmd,
+ pub_names;
  TupleTableSlot *slot;
  Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
  List    *tablelist = NIL;
- bool check_columnlist = (walrcv_server_version(wrconn) >= 150000);
+ int server_version = walrcv_server_version(wrconn);
+ bool check_columnlist = (server_version >= 150000);
+
+ initStringInfo(&pub_names);
+ get_publications_str(publications, &pub_names, true);

  initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");

- /* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ /* Get the list of tables from the publisher. */
+ if (server_version >= 160000)
+ {

~

I think the 'pub_names' is only needed within that ">= 160000" condition.

So all the below code can be moved into that scope can't it?

+ StringInfoData pub_names;
+ initStringInfo(&pub_names);
+ get_publications_str(publications, &pub_names, true);

+ pfree(pub_names.data);

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuNsvO9o9XzeJuSLsAsndgCKVphDPBRqYuOTy2bR28E%2Bg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Data is copied twice when specifying both child and parent table in publication