Обсуждение: [HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index

Поиск
Список
Период
Сортировка

[HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index

От
Masahiko Sawada
Дата:
Moved to -hackers.

On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <scott@deltaex.com> wrote:
> Thank you Masahiko! I've tested and confirmed that this patch fixes the
> problem.
>

Thank you for the testing. This issue should be added to the open item
since this cause of the server crash. I'll add it.

> On Fri, Jul 28, 2017 at 3:07 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Mon, Jul 24, 2017 at 4:22 PM,  <scott@deltaex.com> wrote:
>> > The following bug has been logged on the website:
>> >
>> > Bug reference:      14758
>> > Logged by:          Scott Milliken
>> > Email address:      scott@deltaex.com
>> > PostgreSQL version: 10beta2
>> > Operating system:   Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S
>> > Description:
>> >
>> > I'm testing logical replication on 10beta2, and found a segfault that I
>> > can
>> > reliably reproduce with an index on a not-actually immutable function.
>> >
>> > Here's the function in question:
>> >
>> > ```
>> > CREATE OR REPLACE FUNCTION public.immutable_random(integer)
>> >  RETURNS double precision
>> >  LANGUAGE sql
>> >  IMMUTABLE
>> > AS $function$SELECT random();
>> > $function$;
>> > ```
>> >
>> > It's not actually immutable since it's calling random (a hack to get an
>> > efficient random sort on a table).
>> >
>> > (Aside: I'd understand if it errored on creation of the index, but would
>> > really prefer to keep using this instead of tablesample because it's
>> > fast,
>> > deterministic, and doesn't have sampling biases like the SYSTEM
>> > sampling.)
>> >
>> >
>> > Here's full reproduction instructions:
>> >
>> >
>> > Primary:
>> > ```
>> > mkdir -p /tmp/test-seg0
>> > PGPORT=5301 initdb -D /tmp/test-seg0
>> > echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf
>> > PGPORT=5301 pg_ctl -D /tmp/test-seg0 start
>> > for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi;
>> > sleep
>> > 1; done
>> > psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test'
>> > SUPERUSER
>> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
>> > createdb -p 5301 -E utf8 test
>> >
>> > psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
>> > psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
>> > testtbl_pkey PRIMARY KEY (id);"
>> > psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
>> > testtbl;"
>> > psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
>> > 'a');"
>> > ```
>> >
>> > Secondary:
>> > ```
>> > mkdir -p /tmp/test-seg1
>> > PGPORT=5302 initdb -D /tmp/test-seg1
>> > PGPORT=5302 pg_ctl -D /tmp/test-seg1 start
>> > for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi;
>> > sleep
>> > 1; done
>> > psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test'
>> > SUPERUSER
>> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
>> > createdb -p 5302 -E utf8 test
>> >
>> > psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
>> > psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
>> > testtbl_pkey PRIMARY KEY (id);"
>> > psql -p 5302 -U test test -c 'CREATE FUNCTION
>> > public.immutable_random(integer) RETURNS double precision LANGUAGE sql
>> > IMMUTABLE AS $function$ SELECT random(); $function$'
>> > psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl
>> > USING btree (immutable_random(id));"
>> > psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub
>> > CONNECTION
>> > 'port=5301 user=test dbname=test' PUBLICATION testpub;"
>> > ```
>> >
>> > The secondary crashes with a segfault:
>> >
>> > ```
>> > 2017-07-23 23:55:37.961 PDT [4823] LOG:  logical replication table
>> > synchronization worker for subscription "test0_testpub", table "testtbl"
>> > has started
>> > 2017-07-23 23:55:38.244 PDT [4758] LOG:  worker process: logical
>> > replication
>> > worker for subscription 16396 sync 16386 (PID 4823) was terminated by
>> > signal
>> > 11: Segmentation fault
>> > 2017-07-23 23:55:38.244 PDT [4758] LOG:  terminating any other active
>> > server
>> > processes
>> > 2017-07-23 23:55:38.245 PDT [4763] WARNING:  terminating connection
>> > because
>> > of crash of another server process
>> > 2017-07-23 23:55:38.245 PDT [4763] DETAIL:  The postmaster has commanded
>> > this server process to roll back the current transaction and exit,
>> > because
>> > another server process exited
>> >  abnormally and possibly corrupted shared memory.
>> > 2017-07-23 23:55:38.245 PDT [4763] HINT:  In a moment you should be able
>> > to
>> > reconnect to the database and repeat your command.
>> > 2017-07-23 23:55:38.247 PDT [4758] LOG:  all server processes
>> > terminated;
>> > reinitializing
>> > 2017-07-23 23:55:38.256 PDT [4826] LOG:  database system was
>> > interrupted;
>> > last known up at 2017-07-23 23:55:36 PDT
>> > 2017-07-23 23:55:38.809 PDT [4826] LOG:  database system was not
>> > properly
>> > shut down; automatic recovery in progress
>> > 2017-07-23 23:55:38.812 PDT [4826] LOG:  redo starts at 0/173AEA0
>> > 2017-07-23 23:55:38.815 PDT [4826] LOG:  invalid record length at
>> > 0/17B50B0:
>> > wanted 24, got 0
>> > 2017-07-23 23:55:38.815 PDT [4826] LOG:  redo done at 0/17B5070
>> > 2017-07-23 23:55:38.815 PDT [4826] LOG:  last completed transaction was
>> > at
>> > log time 2017-07-23 23:55:37.962957-07
>> > ```
>> >
>>
>> Thank you for the reporting and precise reproducing steps!
>> I could reproduced this issue and it seems to me that the cause of
>> this is that the table sync worker didn't get a snapshot before
>> starting table copy. Attached patch fixes this problem.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



[HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index

От
Andres Freund
Дата:
On 2017-07-31 09:40:34 +0900, Masahiko Sawada wrote:
> Moved to -hackers.
> 
> On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <scott@deltaex.com> wrote:
> > Thank you Masahiko! I've tested and confirmed that this patch fixes the
> > problem.
> >
> 
> Thank you for the testing. This issue should be added to the open item
> since this cause of the server crash. I'll add it.

Adding Petr to CC list.

- Andres



[HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index

От
Noah Misch
Дата:
On Mon, Jul 31, 2017 at 09:40:34AM +0900, Masahiko Sawada wrote:
> On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <scott@deltaex.com> wrote:
> > Thank you Masahiko! I've tested and confirmed that this patch fixes the
> > problem.
> >
> 
> Thank you for the testing. This issue should be added to the open item
> since this cause of the server crash. I'll add it.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] Re: [BUGS] BUG #14758: Segfault with logicalreplication on a function index

От
Peter Eisentraut
Дата:
On 8/1/17 00:21, Noah Misch wrote:
> On Mon, Jul 31, 2017 at 09:40:34AM +0900, Masahiko Sawada wrote:
>> On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <scott@deltaex.com> wrote:
>>> Thank you Masahiko! I've tested and confirmed that this patch fixes the
>>> problem.
>>>
>>
>> Thank you for the testing. This issue should be added to the open item
>> since this cause of the server crash. I'll add it.
> 
> [Action required within three days.  This is a generic notification.]

I'm looking into this now and will report back on Thursday.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Re: [BUGS] BUG #14758: Segfault with logicalreplication on a function index

От
Peter Eisentraut
Дата:
On 8/1/17 16:29, Peter Eisentraut wrote:
> On 8/1/17 00:21, Noah Misch wrote:
>> On Mon, Jul 31, 2017 at 09:40:34AM +0900, Masahiko Sawada wrote:
>>> On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <scott@deltaex.com> wrote:
>>>> Thank you Masahiko! I've tested and confirmed that this patch fixes the
>>>> problem.
>>>>
>>>
>>> Thank you for the testing. This issue should be added to the open item
>>> since this cause of the server crash. I'll add it.
>>
>> [Action required within three days.  This is a generic notification.]
> 
> I'm looking into this now and will report back on Thursday.

This item has been closed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services