BUG #18236: Backend processing a parallel query terminates badly when postmaster killed with SIGKILL

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #18236: Backend processing a parallel query terminates badly when postmaster killed with SIGKILL
Дата
Msg-id 18236-db547494f5bb70c4@postgresql.org
обсуждение исходный текст
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18236
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.1
Operating system:   Ubuntu 22.04
Description:

The following script, which starts a backend with parallel workers and then
kills postmaster: 
cat << 'EOF' | psql &
CREATE TABLE t (a int) WITH (parallel_workers = 2);
INSERT INTO t SELECT g FROM generate_series(1, 10000) g;
CREATE FUNCTION f(i int) RETURNS int PARALLEL SAFE LANGUAGE plpgsql AS
$$ BEGIN PERFORM pg_sleep(0.001); RETURN i; END; $$;

SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;

SELECT avg(f(a)) FROM t;
EOF

sleep 1
kill -9 $(head -1 "$PGDATA/postmaster.pid")

causes an assertion failure:
TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c",
Line: 591, PID: 2893946

(discovered while testing [1])

with the following call stack:
...
#5  0x00005593007db894 in ExceptionalCondition (conditionName=0x5593009cdbe0
"!IsTransactionOrTransactionBlock()", fileName=0x5593009cda87 "pgstat.c",
lineNumber=591) at assert.c:66
#6  0x000055930061b581 in pgstat_report_stat (force=true) at pgstat.c:591
#7  0x000055930061b499 in pgstat_shutdown_hook (code=1, arg=0) at
pgstat.c:520
#8  0x00005593005b6fdf in shmem_exit (code=1) at ipc.c:243
#9  0x00005593005b6e83 in proc_exit_prepare (code=1) at ipc.c:198
#10 0x00005593005b6dc7 in proc_exit (code=1) at ipc.c:111
#11 0x00005593007dc8e2 in errfinish (filename=0x559300882a7e "parallel.c",
lineno=908, funcname=0x559300882e70 <__func__.8>
"WaitForParallelWorkersToExit") at elog.c:591
#12 0x000055930012fb03 in WaitForParallelWorkersToExit (pcxt=0x55930229ca28)
at parallel.c:908
#13 0x000055930012fccb in DestroyParallelContext (pcxt=0x55930229ca28) at
parallel.c:981
#14 0x00005593001304cf in AtEOXact_Parallel (isCommit=false) at
parallel.c:1254
#15 0x000055930013ee49 in AbortTransaction () at xact.c:2792
#16 0x00005593001419a8 in AbortOutOfAnyTransaction () at xact.c:4755
#17 0x00005593007f6109 in ShutdownPostgres (code=1, arg=0) at
postinit.c:1349
#18 0x00005593005b6fdf in shmem_exit (code=1) at ipc.c:243
#19 0x00005593005b6e83 in proc_exit_prepare (code=1) at ipc.c:198
#20 0x00005593005b6dc7 in proc_exit (code=1) at ipc.c:111
#21 0x00005593005b92b5 in WaitEventSetWaitBlock (set=0x559302263d08,
cur_timeout=2, occurred_events=0x7ffdf264c7d0, nevents=1) at latch.c:1600
#22 0x00005593005b9025 in WaitEventSetWait (set=0x559302263d08, timeout=2,
occurred_events=0x7ffdf264c7d0, nevents=1, wait_event_info=150994946) at
latch.c:1475
#23 0x00005593005b82c2 in WaitLatch (latch=0x7f94b3362184, wakeEvents=41,
timeout=2, wait_event_info=150994946) at latch.c:513
#24 0x00005593006d8505 in pg_sleep (fcinfo=0x559302399b30) at misc.c:406
#25 0x000055930032f3ee in ExecInterpExpr (state=0x559302399a58,
econtext=0x559302399780, isnull=0x7ffdf264caff) at execExprInterp.c:758
...

With that Assert in pgstat.c removed, another failure can be seen:
WARNING:  buffer refcount leak: [1810] (rel=base/16384/16385, blockNum=6,
flags=0x93800000, refcount=1 2)

accompanied with an assertion failure:
...
#5  0x00005619b49fb86d in ExceptionalCondition (conditionName=0x5619b4bda3be
"RefCountErrors == 0", fileName=0x5619b4bd9ba8 "bufmgr.c", lineNumber=3224)
at assert.c:66
#6  0x00005619b47c1156 in CheckForBufferLeaks () at bufmgr.c:3224
#7  0x00005619b47c1075 in AtProcExit_Buffers (code=1, arg=0) at
bufmgr.c:3178
#8  0x00005619b47d7097 in shmem_exit (code=1) at ipc.c:276
#9  0x00005619b47d6e83 in proc_exit_prepare (code=1) at ipc.c:198
#10 0x00005619b47d6dc7 in proc_exit (code=1) at ipc.c:111
#11 0x00005619b49fc8bb in errfinish (filename=0x5619b4aa2a7e "parallel.c",
lineno=908, funcname=0x5619b4aa2e70 <__func__.8>
"WaitForParallelWorkersToExit") at elog.c:591
#12 0x00005619b434fb03 in WaitForParallelWorkersToExit (pcxt=0x5619b570fa28)
at parallel.c:908
#13 0x00005619b434fccb in DestroyParallelContext (pcxt=0x5619b570fa28) at
parallel.c:981
#14 0x00005619b43504cf in AtEOXact_Parallel (isCommit=false) at
parallel.c:1254
#15 0x00005619b435ee49 in AbortTransaction () at xact.c:2792
#16 0x00005619b43619a8 in AbortOutOfAnyTransaction () at xact.c:4755
#17 0x00005619b4a160e2 in ShutdownPostgres (code=1, arg=0) at
postinit.c:1349
#18 0x00005619b47d6fdf in shmem_exit (code=1) at ipc.c:243
#19 0x00005619b47d6e83 in proc_exit_prepare (code=1) at ipc.c:198
#20 0x00005619b47d6dc7 in proc_exit (code=1) at ipc.c:111
#21 0x00005619b47d92b5 in WaitEventSetWaitBlock (set=0x5619b56d6d08,
cur_timeout=2, occurred_events=0x7ffda437c590, nevents=1) at latch.c:1600
#22 0x00005619b47d9025 in WaitEventSetWait (set=0x5619b56d6d08, timeout=2,
occurred_events=0x7ffda437c590, nevents=1, wait_event_info=150994946) at
latch.c:1475
#23 0x00005619b47d82c2 in WaitLatch (latch=0x7f2f0fc49184, wakeEvents=41,
timeout=2, wait_event_info=150994946) at latch.c:513
#24 0x00005619b48f84de in pg_sleep (fcinfo=0x5619b580cb30) at misc.c:406
#25 0x00005619b454f3ee in ExecInterpExpr (state=0x5619b580ca58,
econtext=0x5619b580c780, isnull=0x7ffda437c8bf) at execExprInterp.c:758
...

Or with the last query in a transaction:
BEGIN;
INSERT INTO t VALUES(0);
SELECT avg(f(a)) FROM t;
END;
...
#5  0x0000562f2850886d in ExceptionalCondition (conditionName=0x562f286eb178
"!TransactionIdIsValid(ProcGlobal->xids[myoff])", fileName=0x562f286eb030
"procarray.c", lineNumber=606) at assert.c:66
#6  0x0000562f282e7ec9 in ProcArrayRemove (proc=0x7f0bcad62160, latestXid=0)
at procarray.c:606
#7  0x0000562f283127ab in RemoveProcFromArray (code=1, arg=0) at
proc.c:794
#8  0x0000562f282e4097 in shmem_exit (code=1) at ipc.c:276
#9  0x0000562f282e3e83 in proc_exit_prepare (code=1) at ipc.c:198
#10 0x0000562f282e3dc7 in proc_exit (code=1) at ipc.c:111
#11 0x0000562f285098bb in errfinish (filename=0x562f285afa7e "parallel.c",
lineno=908, funcname=0x562f285afe70 <__func__.8>
"WaitForParallelWorkersToExit") at elog.c:591
#12 0x0000562f27e5cb03 in WaitForParallelWorkersToExit (pcxt=0x562f29861b08)
at parallel.c:908
#13 0x0000562f27e5cccb in DestroyParallelContext (pcxt=0x562f29861b08) at
parallel.c:981
#14 0x0000562f27e5d4cf in AtEOXact_Parallel (isCommit=false) at
parallel.c:1254
#15 0x0000562f27e6be49 in AbortTransaction () at xact.c:2792
#16 0x0000562f27e6e9a8 in AbortOutOfAnyTransaction () at xact.c:4755
#17 0x0000562f285230e2 in ShutdownPostgres (code=1, arg=0) at
postinit.c:1349
#18 0x0000562f282e3fdf in shmem_exit (code=1) at ipc.c:243
#19 0x0000562f282e3e83 in proc_exit_prepare (code=1) at ipc.c:198
#20 0x0000562f282e3dc7 in proc_exit (code=1) at ipc.c:111
#21 0x0000562f282e62b5 in WaitEventSetWaitBlock (set=0x562f29828d08,
cur_timeout=2, occurred_events=0x7ffe231d0af0, nevents=1) at latch.c:1600
#22 0x0000562f282e6025 in WaitEventSetWait (set=0x562f29828d08, timeout=2,
occurred_events=0x7ffe231d0af0, nevents=1, wait_event_info=150994946) at
latch.c:1475
#23 0x0000562f282e52c2 in WaitLatch (latch=0x7f0bcad62184, wakeEvents=41,
timeout=2, wait_event_info=150994946) at latch.c:513
#24 0x0000562f284054de in pg_sleep (fcinfo=0x562f2995cb20) at misc.c:406
#25 0x0000562f2805c3ee in ExecInterpExpr (state=0x562f2995ca48,
econtext=0x562f2995c770, isnull=0x7ffe231d0e1f) at execExprInterp.c:758
...

The backend terminates cleanly with WARNING instead of FATAL here:
         if (status == BGWH_POSTMASTER_DIED)
            ereport(FATAL,
                     (errcode(ERRCODE_ADMIN_SHUTDOWN),
                      errmsg("postmaster exited during a parallel
transaction")));

[1]
https://www.postgresql.org/message-id/5e976369-2925-e0cc-b5a1-e9e356264596%40gmail.com


В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18235: Unable to install postgreSQL15 on Oracle Cloud VM
Следующее
От: Dmytro Astapov
Дата:
Сообщение: Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts