Обсуждение: [BUG] Crash of logical replica with trigger.

Поиск
Список
Период
Сортировка

[BUG] Crash of logical replica with trigger.

От
"Anton A. Melnikov"
Дата:
Hello!

There is a crash during logical replication.

Reproduction on current master:
A) on master (wal_level = logical): create table rul_rule_set that will be replicated:
  CREATE TABLE public.rul_rule_set (id smallint NOT NULL, description text, stage_id integer,
                                    condition text, condition_compiled text);
  ALTER TABLE ONLY public.rul_rule_set ADD CONSTRAINT rul_rule_set_pkey PRIMARY KEY (id);
  CREATE PUBLICATION test_pub FOR TABLE rul_rule_set;

B) on replica
  1. Create tables doc_attribute and rul_rule_set  for incoming changes:
  CREATE TABLE public.doc_attribute (id integer NOT NULL, attr_name text, attr_desc text, attr_type text,
                                     attr_order integer, is_system boolean, used_by_slice boolean);
  INSERT INTO doc_attribute VALUES ('1','name','Имя','text','1','t','t');
  CREATE TABLE public.rul_rule_set (id smallint NOT NULL, description text, stage_id integer,
                                    condition text, condition_compiled text);
  ALTER TABLE ONLY public.doc_attribute ADD CONSTRAINT doc_attribute_pkey PRIMARY KEY (id);
  ALTER TABLE ONLY public.rul_rule_set ADD CONSTRAINT rul_rule_set_pkey PRIMARY KEY (id);

  2.Create procedure and trigger that will be fired on lines inserting into the rul_rule_set table:
  Execute trigger.sql attached.

  3. Create suscription:
  CREATE SUBSCRIPTION test_sub CONNECTION 'port=5116 user=postgres dbname=postgres' PUBLICATION test_pub;

C) on master: insert row in table rul_rule_set:
INSERT INTO rul_rule_set VALUES ('1', 'name','1','age','true');

The replica will crash with:
TRAP: FailedAssertion("ActivePortal && ActivePortal->status == PORTAL_ACTIVE", File: "pg_proc.c", Line: 1038, PID:
310502)

as ActivePortal is NULL because logical replication worker was forked before any active command.
  
The backtrace is attached.

I thought to pass the text of the request as a parameter, but most likely, it will be wrong.
Please suggest the best way to solve this problem, would be very grateful.

With best regards,
-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

Re: [BUG] Crash of logical replica with trigger.

От
Masahiko Sawada
Дата:
Hi,

On Mon, Aug 15, 2022 at 4:48 AM Anton A. Melnikov <aamelnikov@inbox.ru> wrote:
>
> Hello!
>
> There is a crash during logical replication.
>
> Reproduction on current master:
> A) on master (wal_level = logical): create table rul_rule_set that will be replicated:
>   CREATE TABLE public.rul_rule_set (id smallint NOT NULL, description text, stage_id integer,
>                                     condition text, condition_compiled text);
>   ALTER TABLE ONLY public.rul_rule_set ADD CONSTRAINT rul_rule_set_pkey PRIMARY KEY (id);
>   CREATE PUBLICATION test_pub FOR TABLE rul_rule_set;
>
> B) on replica
>   1. Create tables doc_attribute and rul_rule_set  for incoming changes:
>   CREATE TABLE public.doc_attribute (id integer NOT NULL, attr_name text, attr_desc text, attr_type text,
>                                      attr_order integer, is_system boolean, used_by_slice boolean);
>   INSERT INTO doc_attribute VALUES ('1','name','Имя','text','1','t','t');
>   CREATE TABLE public.rul_rule_set (id smallint NOT NULL, description text, stage_id integer,
>                                     condition text, condition_compiled text);
>   ALTER TABLE ONLY public.doc_attribute ADD CONSTRAINT doc_attribute_pkey PRIMARY KEY (id);
>   ALTER TABLE ONLY public.rul_rule_set ADD CONSTRAINT rul_rule_set_pkey PRIMARY KEY (id);
>
>   2.Create procedure and trigger that will be fired on lines inserting into the rul_rule_set table:
>   Execute trigger.sql attached.
>
>   3. Create suscription:
>   CREATE SUBSCRIPTION test_sub CONNECTION 'port=5116 user=postgres dbname=postgres' PUBLICATION test_pub;
>
> C) on master: insert row in table rul_rule_set:
> INSERT INTO rul_rule_set VALUES ('1', 'name','1','age','true');
>
> The replica will crash with:
> TRAP: FailedAssertion("ActivePortal && ActivePortal->status == PORTAL_ACTIVE", File: "pg_proc.c", Line: 1038, PID:
310502)
>
> as ActivePortal is NULL because logical replication worker was forked before any active command.

Right. This is because the logical replication workers (apply worker
and tablesync worker) don't have a Portal.

>
> The backtrace is attached.
>
> I thought to pass the text of the request as a parameter, but most likely, it will be wrong.
> Please suggest the best way to solve this problem, would be very grateful.

In function_parse_error_transpose(), I think that we don't need the
query text in logical replication worker cases since they don't
execute SQL query.

I've attached the patch to fix it. The patch also includes a minimal
TAP test for this case.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: [BUG] Crash of logical replica with trigger.

От
Masahiko Sawada
Дата:
On Wed, Sep 28, 2022 at 10:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Mon, Aug 15, 2022 at 4:48 AM Anton A. Melnikov <aamelnikov@inbox.ru> wrote:
> >
> > Hello!
> >
> > There is a crash during logical replication.
> >
> > Reproduction on current master:
> > A) on master (wal_level = logical): create table rul_rule_set that will be replicated:
> >   CREATE TABLE public.rul_rule_set (id smallint NOT NULL, description text, stage_id integer,
> >                                     condition text, condition_compiled text);
> >   ALTER TABLE ONLY public.rul_rule_set ADD CONSTRAINT rul_rule_set_pkey PRIMARY KEY (id);
> >   CREATE PUBLICATION test_pub FOR TABLE rul_rule_set;
> >
> > B) on replica
> >   1. Create tables doc_attribute and rul_rule_set  for incoming changes:
> >   CREATE TABLE public.doc_attribute (id integer NOT NULL, attr_name text, attr_desc text, attr_type text,
> >                                      attr_order integer, is_system boolean, used_by_slice boolean);
> >   INSERT INTO doc_attribute VALUES ('1','name','Имя','text','1','t','t');
> >   CREATE TABLE public.rul_rule_set (id smallint NOT NULL, description text, stage_id integer,
> >                                     condition text, condition_compiled text);
> >   ALTER TABLE ONLY public.doc_attribute ADD CONSTRAINT doc_attribute_pkey PRIMARY KEY (id);
> >   ALTER TABLE ONLY public.rul_rule_set ADD CONSTRAINT rul_rule_set_pkey PRIMARY KEY (id);
> >
> >   2.Create procedure and trigger that will be fired on lines inserting into the rul_rule_set table:
> >   Execute trigger.sql attached.
> >
> >   3. Create suscription:
> >   CREATE SUBSCRIPTION test_sub CONNECTION 'port=5116 user=postgres dbname=postgres' PUBLICATION test_pub;
> >
> > C) on master: insert row in table rul_rule_set:
> > INSERT INTO rul_rule_set VALUES ('1', 'name','1','age','true');
> >
> > The replica will crash with:
> > TRAP: FailedAssertion("ActivePortal && ActivePortal->status == PORTAL_ACTIVE", File: "pg_proc.c", Line: 1038, PID:
310502)
> >
> > as ActivePortal is NULL because logical replication worker was forked before any active command.
>
> Right. This is because the logical replication workers (apply worker
> and tablesync worker) don't have a Portal.
>
> >
> > The backtrace is attached.
> >
> > I thought to pass the text of the request as a parameter, but most likely, it will be wrong.
> > Please suggest the best way to solve this problem, would be very grateful.
>
> In function_parse_error_transpose(), I think that we don't need the
> query text in logical replication worker cases since they don't
> execute SQL query.
>
> I've attached the patch to fix it. The patch also includes a minimal
> TAP test for this case.

I've registered this issue to the next CF so as not to forget.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: [BUG] Crash of logical replica with trigger.

От
Michael Paquier
Дата:
On Thu, Sep 29, 2022 at 03:21:31PM +0900, Masahiko Sawada wrote:
> I've registered this issue to the next CF so as not to forget.

Thanks for the patch, looking..
--
Michael

Вложения

Re: [BUG] Crash of logical replica with trigger.

От
Masahiko Sawada
Дата:
On Fri, Oct 7, 2022 at 4:24 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 29, 2022 at 03:21:31PM +0900, Masahiko Sawada wrote:
> > I've registered this issue to the next CF so as not to forget.
>
> Thanks for the patch, looking..

Thanks!

I realized that this problem is also reported in -hackers and there is
some discussion there[1]. The patch I proposed seems to be the same as
the patch Anton A. Melnikov proposed, but Tom does not like it.

Regards,

[1] https://www.postgresql.org/message-id/b570c367-ba38-95f3-f62d-5f59b9808226%40inbox.ru

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: [BUG] Crash of logical replica with trigger.

От
Michael Paquier
Дата:
On Fri, Oct 07, 2022 at 05:19:46PM +0900, Masahiko Sawada wrote:
> I realized that this problem is also reported in -hackers and there is
> some discussion there[1]. The patch I proposed seems to be the same as
> the patch Anton A. Melnikov proposed, but Tom does not like it.

This patch was listed in the CF which is why I have begun poking at
it.

Anyway, I have the same opinion, and the first thing I was looking at
are other code paths similar to the one patched here where we expect a
portal to be always active :)

Your patch switches a procedure path where we have assumed for years
that a portal would be active in its error handling.  This could hide
problems in the future if for some reason this API begins to get used
in the context where a portal is not active, so logical workers are
IMO doing the wrong thing for the commands they are executing.

Let's move the discussion to the other thread, thanks for the
pointer.
--
Michael

Вложения