Обсуждение: pgsql: Raise a warning if there is a possibility of data from multiple

Поиск
Список
Период
Сортировка

pgsql: Raise a warning if there is a possibility of data from multiple

От
Amit Kapila
Дата:
Raise a warning if there is a possibility of data from multiple origins.

This commit raises a warning message for a combination of options
('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
operations if the publication tables were also replicated from other
publishers.

During replication, we can skip the data from other origins as we have that
information in WAL but that is not possible during initial sync so we raise
a warning if there is such a possibility.

Author: Vignesh C
Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3

Modified Files
--------------
doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
4 files changed, 258 insertions(+), 29 deletions(-)


Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Masahiko Sawada
Дата:
On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
>
> Raise a warning if there is a possibility of data from multiple origins.
>
> This commit raises a warning message for a combination of options
> ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> operations if the publication tables were also replicated from other
> publishers.
>
> During replication, we can skip the data from other origins as we have that
> information in WAL but that is not possible during initial sync so we raise
> a warning if there is such a possibility.
>
> Author: Vignesh C
> Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
>
> Branch
> ------
> master
>
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
>
> Modified Files
> --------------
> doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> 4 files changed, 258 insertions(+), 29 deletions(-)

+       appendStringInfoString(&cmd,
+                                                  "SELECT DISTINCT
P.pubname AS pubname\n"
+                                                  "FROM pg_publication P,\n"
+                                                  "     LATERAL
pg_get_publication_tables(P.pubname) GPT\n"
+                                                  "     JOIN
pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+                                                  "     pg_class C
JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+                                                  "WHERE C.oid =
GPT.relid AND P.pubname IN (");

Should the tables and the function in this query be schema-qualified?
Looking at other code in subscriptioncmd.c, we use schema-qualified
names. It works even without the schema names but it may be better to
make them consistent.

FYI, looking at tablesync.c, there are both styles; it seems that
recently added codes use schema-unqualified names.

Regards,

--
Masahiko Sawada



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Amit Kapila
Дата:
On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
> >
> > Raise a warning if there is a possibility of data from multiple origins.
> >
> > This commit raises a warning message for a combination of options
> > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> > operations if the publication tables were also replicated from other
> > publishers.
> >
> > During replication, we can skip the data from other origins as we have that
> > information in WAL but that is not possible during initial sync so we raise
> > a warning if there is such a possibility.
> >
> > Author: Vignesh C
> > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> > Discussion:
https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
> >
> > Branch
> > ------
> > master
> >
> > Details
> > -------
> > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
> >
> > Modified Files
> > --------------
> > doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> > doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> > src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> > src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> > 4 files changed, 258 insertions(+), 29 deletions(-)
>
> +       appendStringInfoString(&cmd,
> +                                                  "SELECT DISTINCT
> P.pubname AS pubname\n"
> +                                                  "FROM pg_publication P,\n"
> +                                                  "     LATERAL
> pg_get_publication_tables(P.pubname) GPT\n"
> +                                                  "     JOIN
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +                                                  "     pg_class C
> JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> +                                                  "WHERE C.oid =
> GPT.relid AND P.pubname IN (");
>
> Should the tables and the function in this query be schema-qualified?
> Looking at other code in subscriptioncmd.c, we use schema-qualified
> names. It works even without the schema names but it may be better to
> make them consistent.
>
> FYI, looking at tablesync.c, there are both styles; it seems that
> recently added codes use schema-unqualified names.
>

Yeah, it is better to be consistent in all places and add a schema
name before accessing an object rather than for one or two places. Do
we need similar handling for pg_dump code as well, see
getPublications()?

-- 
With Regards,
Amit Kapila.



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
vignesh C
Дата:
On Thu, 8 Sept 2022 at 12:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
> >
> > Raise a warning if there is a possibility of data from multiple origins.
> >
> > This commit raises a warning message for a combination of options
> > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> > operations if the publication tables were also replicated from other
> > publishers.
> >
> > During replication, we can skip the data from other origins as we have that
> > information in WAL but that is not possible during initial sync so we raise
> > a warning if there is such a possibility.
> >
> > Author: Vignesh C
> > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> > Discussion:
https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
> >
> > Branch
> > ------
> > master
> >
> > Details
> > -------
> > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
> >
> > Modified Files
> > --------------
> > doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> > doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> > src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> > src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> > 4 files changed, 258 insertions(+), 29 deletions(-)
>
> +       appendStringInfoString(&cmd,
> +                                                  "SELECT DISTINCT
> P.pubname AS pubname\n"
> +                                                  "FROM pg_publication P,\n"
> +                                                  "     LATERAL
> pg_get_publication_tables(P.pubname) GPT\n"
> +                                                  "     JOIN
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +                                                  "     pg_class C
> JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> +                                                  "WHERE C.oid =
> GPT.relid AND P.pubname IN (");
>
> Should the tables and the function in this query be schema-qualified?
> Looking at other code in subscriptioncmd.c, we use schema-qualified
> names. It works even without the schema names but it may be better to
> make them consistent.
>
> FYI, looking at tablesync.c, there are both styles; it seems that
> recently added codes use schema-unqualified names.

There will be no issues as such because we set search_path to
always-secure search path while creating connection
(libpqrcv_connect-> libpqrcv_PQexec(conn->streamConn,
ALWAYS_SECURE_SEARCH_PATH_SQL). But I agree it will be better to keep
it consistent across all places.

Regards,
Vignesh



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Masahiko Sawada
Дата:
On Thu, Sep 8, 2022 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
> > >
> > > Raise a warning if there is a possibility of data from multiple origins.
> > >
> > > This commit raises a warning message for a combination of options
> > > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> > > operations if the publication tables were also replicated from other
> > > publishers.
> > >
> > > During replication, we can skip the data from other origins as we have that
> > > information in WAL but that is not possible during initial sync so we raise
> > > a warning if there is such a possibility.
> > >
> > > Author: Vignesh C
> > > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> > > Discussion:
https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
> > >
> > > Branch
> > > ------
> > > master
> > >
> > > Details
> > > -------
> > > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
> > >
> > > Modified Files
> > > --------------
> > > doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> > > doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> > > src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> > > src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> > > 4 files changed, 258 insertions(+), 29 deletions(-)
> >
> > +       appendStringInfoString(&cmd,
> > +                                                  "SELECT DISTINCT
> > P.pubname AS pubname\n"
> > +                                                  "FROM pg_publication P,\n"
> > +                                                  "     LATERAL
> > pg_get_publication_tables(P.pubname) GPT\n"
> > +                                                  "     JOIN
> > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> > +                                                  "     pg_class C
> > JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> > +                                                  "WHERE C.oid =
> > GPT.relid AND P.pubname IN (");
> >
> > Should the tables and the function in this query be schema-qualified?
> > Looking at other code in subscriptioncmd.c, we use schema-qualified
> > names. It works even without the schema names but it may be better to
> > make them consistent.
> >
> > FYI, looking at tablesync.c, there are both styles; it seems that
> > recently added codes use schema-unqualified names.
> >
>
> Yeah, it is better to be consistent in all places and add a schema
> name before accessing an object rather than for one or two places. Do
> we need similar handling for pg_dump code as well, see
> getPublications()?

We can fix at least getPublications() and getSubscriptions() as well.
I'll make a patch for that. Or do you want to fix all of them,
including non-logical-replication-related ones?

Regards,

-- 
Masahiko Sawada



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Amit Kapila
Дата:
On Thu, Sep 8, 2022 at 2:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > Should the tables and the function in this query be schema-qualified?
> > > Looking at other code in subscriptioncmd.c, we use schema-qualified
> > > names. It works even without the schema names but it may be better to
> > > make them consistent.
> > >
> > > FYI, looking at tablesync.c, there are both styles; it seems that
> > > recently added codes use schema-unqualified names.
> > >
> >
> > Yeah, it is better to be consistent in all places and add a schema
> > name before accessing an object rather than for one or two places. Do
> > we need similar handling for pg_dump code as well, see
> > getPublications()?
>
> We can fix at least getPublications() and getSubscriptions() as well.
> I'll make a patch for that.

Thanks.

> Or do you want to fix all of them,
> including non-logical-replication-related ones?
>

I feel it is better to be consistent across the entire code base
unless there is a reason for doing it differently. Does anyone else
have any thoughts on this matter?


-- 
With Regards,
Amit Kapila.



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> I feel it is better to be consistent across the entire code base
> unless there is a reason for doing it differently. Does anyone else
> have any thoughts on this matter?

That sounds like it would be a pretty massive and unnecessary patch.

            regards, tom lane



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Amit Kapila
Дата:
On Thu, Sep 8, 2022 at 7:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > I feel it is better to be consistent across the entire code base
> > unless there is a reason for doing it differently. Does anyone else
> > have any thoughts on this matter?
>
> That sounds like it would be a pretty massive and unnecessary patch.
>

Fair enough. Do you mind being consistent in this regard for logical
replication-related code? BTW, is there a reason we prefer to write in
one or another way (with or without appending schema_name)?

-- 
With Regards,
Amit Kapila.



Re: pgsql: Raise a warning if there is a possibility of data from multiple

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Fair enough. Do you mind being consistent in this regard for logical
> replication-related code?

As long as not too many individual changes are involved, sure.
But consistency of this sort doesn't seem worth creating a lot
of back-patching land mines, IMO.

> BTW, is there a reason we prefer to write in
> one or another way (with or without appending schema_name)?

In places where we can trust the search_path to be just pg_catalog,
there's no real strong reason to write a schema name, I think.

            regards, tom lane