Обсуждение: BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC
BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 15821
Logged by: Christian Hofstaedtler
Email address: ch+pg@zeha.at
PostgreSQL version: 11.3
Operating system: Debian stretch amd64
Description:
We have enabled auto_explain and see errors on PostgreSQL 11.3 when
SELECTing from a user defined function. No such crashes have been
observed on 10.7.
Maybe relevant config settings:
log_min_duration_statement = 0
auto_explain.log_min_duration = 150ms
auto_explain.log_analyze = on
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
I can trigger the error on our database using:
CREATE TABLE reprotable(id serial, val int);
INSERT INTO reprotable(val) SELECT * FROM generate_series(1, 10000000);
CREATE OR REPLACE FUNCTION public.crashrepro4() RETURNS TABLE(foo integer,
foo2 integer) LANGUAGE sql AS $function$
SELECT c.id, v.val FROM reprotable c JOIN reprotable v ON v.id = c.id
$function$;
LOAD 'auto_explain';
SET max_parallel_workers_per_gather TO 8;
SELECT * FROM crashrepro4();
Sometimes this works on the first few tries; running
EXPLAIN ANALYZE SELECT * FROM crashrepro4();
appears to help with the reproduction.
Output:
ERROR: could not find key 3 in shm TOC at 0x7f45a0334000
CONTEXT: parallel worker
SQL function "crashrepro4" statement 1
Backtrace:
Program received signal SIGUSR1, User defined signal 1.
0x00007f459dc94bc6 in posix_fallocate64 () from
/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0 0x00007f459dc94bc6 in posix_fallocate64 () from
/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1 0x00005636d9cc675b in dsm_impl_posix_resize (size=134483968, fd=7) at
./build/../src/backend/storage/ipc/dsm_impl.c:441
rc = <optimized out>
#2 dsm_impl_posix (impl_private=0x8041000, elevel=20,
mapped_size=0x5636dc084348, mapped_address=0x5636dc084340,
request_size=134483968, handle=<optimized out>, op=DSM_OP_CREATE) at
./build/../src/backend/storage/ipc/dsm_impl.c:326
flags = <optimized out>
fd = 7
name =
"/PostgreSQL.1895625775\000\000X\274ʝE\177\000\000@\354\322\357\377\177\000\000\020\337\003\334\066V\000\000@\000\000\000\000\000\000\000\000\200\000\000\000\000\000"
address = <optimized out>
#3 dsm_impl_op (op=op@entry=DSM_OP_CREATE, handle=<optimized out>,
request_size=request_size@entry=134483968,
impl_private=impl_private@entry=0x5636dc084338,
mapped_address=mapped_address@entry=0x5636dc084340,
mapped_size=mapped_size@entry=0x5636dc084348, elevel=20) at
./build/../src/backend/storage/ipc/dsm_impl.c:177
__func__ = "dsm_impl_op"
#4 0x00005636d9cc7877 in dsm_create (size=size@entry=134483968,
flags=flags@entry=0) at ./build/../src/backend/storage/ipc/dsm.c:474
seg = 0x5636dc084318
i = <optimized out>
nitems = <optimized out>
__func__ = "dsm_create"
#5 0x00005636d9e2b317 in make_new_segment (area=area@entry=0x5636dc17fd08,
requested_pages=requested_pages@entry=32768) at
./build/../src/backend/utils/mmgr/dsa.c:2155
new_index = 2
metadata_bytes = 266240
total_size = 134483968
total_pages = <optimized out>
usable_pages = 32768
segment_map = <optimized out>
segment = <optimized out>
#6 0x00005636d9e2cbce in dsa_allocate_extended (area=0x5636dc17fd08,
size=size@entry=134217728, flags=flags@entry=0) at
./build/../src/backend/utils/mmgr/dsa.c:712
npages = 32768
first_page = 139937017269144
pool = 0x7f45a02a25f0
size_class = <optimized out>
start_pointer = <optimized out>
segment_map = <optimized out>
__func__ = "dsa_allocate_extended"
#7 0x00005636d9bb3e95 in ExecParallelHashTableAlloc
(hashtable=hashtable@entry=0x5636dc173a70, batchno=batchno@entry=0) at
./build/../src/backend/executor/nodeHash.c:3047
batch = 0x7f417ccc0000
buckets = <optimized out>
nbuckets = 16777216
i = <optimized out>
#8 0x00005636d9bb42e3 in ExecHashTableCreate
(state=state@entry=0x5636dc146280, hashOperators=<optimized out>,
keepNulls=<optimized out>) at
./build/../src/backend/executor/nodeHash.c:615
pstate = <optimized out>
build_barrier = 0x7f45a02a1450
node = 0x5636dc140568
hashtable = 0x5636dc173a70
outerNode = <optimized out>
space_allowed = 64424509440
nbuckets = 16777216
nbatch = 1
rows = <optimized out>
num_skew_mcvs = 10391049
log2_nbuckets = <optimized out>
i = 0
ho = <optimized out>
__func__ = "ExecHashTableCreate"
#9 0x00005636d9bb7085 in ExecHashJoinImpl (parallel=true, pstate=<optimized
out>) at ./build/../src/backend/executor/nodeHashjoin.c:279
outerNode = <optimized out>
hashNode = <optimized out>
econtext = <optimized out>
node = <optimized out>
joinqual = <optimized out>
otherqual = <optimized out>
hashtable = 0x0
hashvalue = 32767
batchno = -641936805
parallel_state = <optimized out>
#10 ExecParallelHashJoin (pstate=<optimized out>) at
./build/../src/backend/executor/nodeHashjoin.c:581
No locals.
#11 0x00005636d9bb2194 in ExecProcNode (node=0x5636dc145b60) at
./build/../src/include/executor/executor.h:247
No locals.
#12 gather_getnext (gatherstate=0x5636dc145970) at
./build/../src/backend/executor/nodeGather.c:276
estate = 0x5636dc145730
outerPlan = 0x5636dc145b60
fslot = 0x5636dc0743a8
#13 ExecGather (pstate=0x5636dc145970) at
./build/../src/backend/executor/nodeGather.c:207
node = 0x5636dc145970
econtext = 0x5636dc145a80
#14 0x00005636d9b9b4c3 in ExecProcNode (node=0x5636dc145970) at
./build/../src/include/executor/executor.h:247
No locals.
#15 ExecutePlan (execute_once=<optimized out>, dest=0x5636dc141648,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x5636dc145970,
estate=0x5636dc145730) at
./build/../src/backend/executor/execMain.c:1723
slot = <optimized out>
current_tuple_count = 0
#16 standard_ExecutorRun (queryDesc=0x5636dc141698, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
./build/../src/backend/executor/execMain.c:364
estate = 0x5636dc145730
operation = CMD_SELECT
dest = 0x5636dc141648
sendTuples = <optimized out>
__func__ = "standard_ExecutorRun"
#17 0x00007f4597d08e15 in pgss_ExecutorRun (queryDesc=0x5636dc141698,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
./build/../contrib/pg_stat_statements/pg_stat_statements.c:892
save_exception_stack = 0x7fffefd2f0a0
save_context_stack = 0x7fffefd2f1d0
local_sigjmp_buf = {{__jmpbuf = {94793620402592, -81779228135919821,
1, 0, 94793620523992, 94793620526040, -81779228572127437,
-24340724960596173}, __mask_was_saved = 0, __saved_mask = {__val =
{94793619686312,
140737216966656, 94793581381244, 94793619686312,
94793620543280, 140737216966688, 94793581092645, 94793620543856,
94793620543280, 140737216966736, 94793581143132, 94793620522984, 16, 0,
94793620543280,
140737216966784}}}}
#18 0x00007f417cdbf4dd in explain_ExecutorRun (queryDesc=0x5636dc141698,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
./build/../contrib/auto_explain/auto_explain.c:268
save_exception_stack = 0x7fffefd2f8a0
save_context_stack = 0x7fffefd2f1d0
local_sigjmp_buf = {{__jmpbuf = {94793620402592, -81779228104462541,
1, 0, 94793620523992, 94793620526040, -81779228138016973,
-26933449381777613}, __mask_was_saved = 0, __saved_mask = {__val =
{94793581107904, 94793620526744,
94793588700640, 140737216966960, 139936876507429,
140737216967008, 94793620526744, 1, 0, 140737216967024, 139919244391853, 1,
0, 94793620523992, 94793620402592, 1}}}}
#19 0x00005636d9ba8f6a in postquel_getnext (es=0x5636dc140bd8,
es=0x5636dc140bd8, fcache=0x5636dc1231a0, fcache=0x5636dc1231a0) at
./build/../src/backend/executor/functions.c:867
count = 0
#20 fmgr_sql (fcinfo=0x7fffefd2f340) at
./build/../src/backend/executor/functions.c:1164
fcache = 0x5636dc1231a0
sqlerrcontext = {previous = 0x0, callback = 0x5636d9ba77a0
<sql_exec_error_callback>, arg = 0x5636dc06a230}
randomAccess = false
lazyEvalOK = <optimized out>
is_first = <optimized out>
pushed_snapshot = true
es = 0x5636dc140bd8
slot = <optimized out>
result = <optimized out>
eslist = <optimized out>
eslc = 0x5636dc1413d8
__func__ = "fmgr_sql"
#21 0x00005636d9ba4bac in ExecMakeTableFunctionResult
(setexpr=0x5636dc06a210, econtext=0x5636dc06a0e0, argContext=<optimized
out>, expectedDesc=0x5636dc06ac68, randomAccess=false) at
./build/../src/backend/executor/execSRF.c:231
result = 94793580413993
tupstore = 0x0
tupdesc = 0x0
funcrettype = 2249
returnsTuple = <optimized out>
returnsSet = true
fcinfo = {flinfo = 0x5636dc06a230, context = 0x0, resultinfo =
0x7fffefd2f300, fncollation = 0, isnull = false, nargs = 0, arg =
{94793619488832, 140733193388046, 0, 94793580418128, 140737216967520, 6,
3432, 140733193388035,
94793585267779, 1, 59588104878912, 94793585267784, 32,
139936976061625, 94793584837154, 1, 140737216968608, 3, 140737216967832,
206158430224, 140737216969520, 94793620609964, 94793620609964,
139936977225618,
139936979540000, 8, 0, 139936976486110, 3688503315210960912,
8187, 0, 94793620609936, 94793620618144, 139936979544832, 0, 160, 142,
139936976269588, 140737216967904, 94793583475689, 140737216968048,
94793620609952,
94793620609936, 252098579, 140737216968464, 139937017242608,
139937017242584, 139937017242608, 139937017242608, 94793619743656,
140737216967952, 94793583769478, 139937017242608, 7074228167464498944,
94793620438296,
94793581565312, 140737216968032, 94793581372751,
140737216968032, 94793620438296, 140737216968464, 140737216968464,
94793584972684, 0, 140737216968144, 94793581565681, 140737216968080,
7074228167464498944, 94793620438216,
94793581565312, 140737216968144, 94793581372751, 1,
7074228167464498944, 140737216968464, 94793620438216, 140737216968464,
140737216968464, 140737216968256, 94793581565681, 4294967295,
139937017247848, 139937017247824,
139937017247848, 139937017247848, 94793619743656,
140737216968240, 7074228167464498944, 139937017247848, 94793620438376,
94793581565312, 140737216968464, 140737216968320, 94793581372827,
140737216968304, 94793620438408,
140737216968464, 94793620436824, 0, 0}, argnull = {240, 246,
210, 239, 255, 127, false, false, 241, 150, 193, 217, 54, 86, false, false,
16, 247, 210, 239, 255, 127, false, false, 169, 156, 193, 217, 54, 86,
false, false,
240, 246, 210, 239, 255, 127, false, false, 233, 187, 222, 217,
54, 86, false, false, 24, false, false, false, false, false, false, false,
175, 189, 160, 217, 54, 86, false, false, 16, 247, 210, 239, 255, 127,
false,
false, 233, 187, 222, 217, 54, 86, false, false, 23, false,
false, false, false, false, false, false, 88, 171, 6, 220, 54, 86, false,
false, 80, 56, 52, 160}}
fcusage = {fs = 0x5636dc1278c0, save_f_total_time = {tv_sec = 0,
tv_nsec = 0}, save_total = {tv_sec = 0, tv_nsec = 0}, f_start = {tv_sec =
469579, tv_nsec = 359852825}}
rsinfo = {type = T_ReturnSetInfo, econtext = 0x5636dc06a0e0,
expectedDesc = 0x5636dc06ac68, allowedModes = 11, returnMode =
SFRM_ValuePerCall, isDone = ExprSingleResult, setResult = 0x0, setDesc =
0x0}
tmptup = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0},
ip_posid = 2}, t_tableOid = 0, t_data = 0x2}
callerContext = 0x5636dc069cb0
first_time = true
__func__ = "ExecMakeTableFunctionResult"
#22 0x00005636d9bb1760 in FunctionNext (node=node@entry=0x5636dc069fd0) at
./build/../src/backend/executor/nodeFunctionscan.c:94
tstore = 0x0
estate = <optimized out>
direction = ForwardScanDirection
scanslot = 0x5636dc06ae88
alldone = <optimized out>
oldpos = <optimized out>
funcno = <optimized out>
att = <optimized out>
#23 0x00005636d9ba40f9 in ExecScanFetch (recheckMtd=0x5636d9bb14a0
<FunctionRecheck>, accessMtd=0x5636d9bb14d0 <FunctionNext>,
node=0x5636dc069fd0) at ./build/../src/backend/executor/execScan.c:95
estate = 0x5636dc069dc0
#24 ExecScan (node=0x5636dc069fd0, accessMtd=0x5636d9bb14d0 <FunctionNext>,
recheckMtd=0x5636d9bb14a0 <FunctionRecheck>) at
./build/../src/backend/executor/execScan.c:145
econtext = <optimized out>
qual = 0x0
projInfo = 0x0
#25 0x00005636d9ba25c9 in ExecProcNodeInstr (node=0x5636dc069fd0) at
./build/../src/backend/executor/execProcnode.c:461
result = <optimized out>
#26 0x00005636d9b9b4c3 in ExecProcNode (node=0x5636dc069fd0) at
./build/../src/include/executor/executor.h:247
No locals.
#27 ExecutePlan (execute_once=<optimized out>, dest=0x5636dc12ccb8,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x5636dc069fd0,
estate=0x5636dc069dc0) at
./build/../src/backend/executor/execMain.c:1723
slot = <optimized out>
current_tuple_count = 0
#28 standard_ExecutorRun (queryDesc=0x5636dc068590, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
./build/../src/backend/executor/execMain.c:364
estate = 0x5636dc069dc0
operation = CMD_SELECT
dest = 0x5636dc12ccb8
sendTuples = <optimized out>
__func__ = "standard_ExecutorRun"
#29 0x00007f4597d08e15 in pgss_ExecutorRun (queryDesc=0x5636dc068590,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
./build/../contrib/pg_stat_statements/pg_stat_statements.c:892
save_exception_stack = 0x7fffefd2f9b0
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {94793620008672, -81779227836027085,
0, 94793619637648, 0, 1, -81779227869581517, -24340724960596173},
__mask_was_saved = 0, __saved_mask = {__val = {94793583733915,
140737216969008,
7074228167464498944, 140737216969008, 94793619643840,
94793620438296, 94793619644864, 2, 94793619644368, 140737216969168,
94793581140788, 7987184768, 94793620441608, 94793619643840, 336, 1}}}}
#30 0x00007f417cdbf4dd in explain_ExecutorRun (queryDesc=0x5636dc068590,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
./build/../contrib/auto_explain/auto_explain.c:268
save_exception_stack = 0x7fffefd2fb50
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {94793620008672, -81779227930398925,
0, 94793619637648, 0, 1, -81779227829735629, -26933449381777613},
__mask_was_saved = 0, __saved_mask = {__val = {94793620441784,
140737216969344, 336, 1,
94793620155408, 24, 94793619637648, 0, 1, 140737216969312,
94793583755684, 94793620008672, 94793620155728, 140737216969344,
94793583827107, 94793620008672}}}}
#31 0x00005636d9cee59b in PortalRunSelect
(portal=portal@entry=0x5636dc0c2ee0, forward=forward@entry=true, count=0,
count@entry=9223372036854775807, dest=dest@entry=0x5636dc12ccb8) at
./build/../src/backend/tcop/pquery.c:932
queryDesc = 0x5636dc068590
direction = <optimized out>
nprocessed = <optimized out>
__func__ = "PortalRunSelect"
#32 0x00005636d9cefb20 in PortalRun (portal=portal@entry=0x5636dc0c2ee0,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x5636dc12ccb8,
altdest=altdest@entry=0x5636dc12ccb8, completionTag=0x7fffefd2fca0 "")
at ./build/../src/backend/tcop/pquery.c:773
save_exception_stack = 0x7fffefd2fec0
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {1, -81779228012187853,
94793619488704, 94793619488752, 94793620008672, 2, -81779227917816013,
-5958515187191648461}, __mask_was_saved = 0, __saved_mask = {__val =
{94793619463952, 0,
2817148525, 94793620020232, 94793619637376, 4, 112,
94793619488704, 94793619484688, 94793584837140, 2, 140737216969760,
94793583757029, 140737216969792, 2, 94793619488704}}}}
result = <optimized out>
nprocessed = <optimized out>
saveTopTransactionResourceOwner = 0x5636dc0823a8
saveTopTransactionContext = 0x5636dc0e6c10
saveActivePortal = 0x0
saveResourceOwner = 0x5636dc0823a8
savePortalContext = 0x0
saveMemoryContext = 0x5636dc0e6c10
__func__ = "PortalRun"
#33 0x00005636d9ceb7b9 in exec_simple_query (query_string=0x5636dc043120
"SELECT * FROM crashrepro4();") at
./build/../src/backend/tcop/postgres.c:1145
parsetree = 0x5636dc043fc0
portal = 0x5636dc0c2ee0
snapshot_set = <optimized out>
commandTag = <optimized out>
completionTag =
"\000\000\000\000\002\000\000\000\340\374\322\357\377\177\000\000-\r\324\357\377\177\000\000\000\000\000\000\000\000\000\000\360\374\322\357\002\000\000\000Q\000\000\000\000\000\000\000
1\004\334\066V\000\000`\376\322\357\377\177\000"
querytree_list = <optimized out>
plantree_list = <optimized out>
receiver = 0x5636dc12ccb8
format = 0
dest = DestRemote
parsetree_list = 0x5636dc044010
parsetree_item = 0x5636dc043ff0
save_log_statement_stats = false
was_logged = false
use_implicit_block = false
msec_str =
"\000\000\000\000\002\000\000\000\340\374\322\357\377\177\000\000-\r\324\357\377\177\000\000\000\000\000\000\000\000\000"
__func__ = "exec_simple_query"
#34 0x00005636d9ced623 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x5636dc070128, dbname=<optimized out>, username=<optimized
out>) at ./build/../src/backend/tcop/postgres.c:4182
query_string = 0x5636dc043120 "SELECT * FROM crashrepro4();"
input_message = {data = 0x5636dc043120 "SELECT * FROM
crashrepro4();", len = 29, maxlen = 1024, cursor = 29}
local_sigjmp_buf = {{__jmpbuf = {140737216970368,
-81779225453662413, 1, 94793619668968, 94793619668968, 94793619632848,
-81779227978633421, -5958515189572833485}, __mask_was_saved = 1,
__saved_mask = {__val = {0,
140737216970544, 140737216970540, 140737216970640,
8589934592, 94793588696280, 94793585032905, 140737216970864,
140737216972320, 140737216971248, 94793619668968, 94793619632848,
139936976780376, 5, 206158430256,
140737216970848}}}}
send_ready_for_query = false
disable_idle_in_transaction_timeout = false
__func__ = "PostgresMain"
#35 0x00005636d99fb097 in BackendRun (port=0x5636dc0672d0) at
./build/../src/backend/postmaster/postmaster.c:4358
ac = 1
secs = 612288553
usecs = 224860
i = 1
av = 0x5636dc070128
maxac = <optimized out>
#36 BackendStartup (port=0x5636dc0672d0) at
./build/../src/backend/postmaster/postmaster.c:4030
bn = <optimized out>
pid = <optimized out>
#37 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1707
rmask = {fds_bits = {128, 0 <repeats 15 times>}}
selres = <optimized out>
now = <optimized out>
readmask = {fds_bits = {200, 0 <repeats 15 times>}}
last_lockfile_recheck_time = 1558973336
last_touch_time = 1558973276
__func__ = "ServerLoop"
#38 0x00005636d9c78221 in PostmasterMain (argc=7, argv=0x5636dc03de10) at
./build/../src/backend/postmaster/postmaster.c:1380
opt = <optimized out>
status = <optimized out>
userDoption = <optimized out>
listen_addr_saved = true
i = <optimized out>
output_config_variable = <optimized out>
__func__ = "PostmasterMain"
#39 0x00005636d99fc594 in main (argc=7, argv=0x5636dc03de10) at
./build/../src/backend/main/main.c:228
No locals.
(gdb)
PG Bug reporting form <noreply@postgresql.org> writes:
> We have enabled auto_explain and see errors on PostgreSQL 11.3 when
> SELECTing from a user defined function. No such crashes have been
> observed on 10.7.
I think that you didn't give a complete dump of relevant settings,
but after some fooling around I was able to reproduce this error,
and the cause is this: auto_explain hasn't a single clue about
parallel query.
1. In the parent process, we have a parallelizable hash join being
executed in a statement inside a function. Since
auto_explain.log_nested_statements is not enabled, auto_explain
does not deem that it should trace the statement, so the query
starts up with estate->es_instrument = 0, and therefore
ExecHashInitializeDSM chooses not to create any shared
SharedHashInfo area.
2. In the worker processes, auto_explain manages to grab execution
control when ParallelQueryMain calls ExecutorStart, thanks to being
in ExecutorStart_hook. Having no clue what's going on, it decides
that this is a new top-level query that it should trace, and it
sets some bits in queryDesc->instrument_options.
3. When the workers get to ExecHashInitializeWorker, they see that
instrumentation is active so they try to look up the SharedHashInfo.
Kaboom.
I'm inclined to think that explain_ExecutorStart should simply
keep its hands off of everything when in a parallel worker;
if instrumentation is required, that'll be indicated by options
passed down from the parent process. It looks like this could
conveniently be merged with the rate-sampling logic by forcing
current_query_sampled to false when IsParallelWorker().
Likely this should be back-patched all the way to 9.6. I'm
not sure how we managed to avoid noticing it before now,
but there are probably ways to cause visible trouble in
any release that has any parallel query support.
regards, tom lane
PG Bug reporting form <noreply@postgresql.org> writes:
> We have enabled auto_explain and see errors on PostgreSQL 11.3 when
> SELECTing from a user defined function. No such crashes have been
> observed on 10.7.
I think that you didn't give a complete dump of relevant settings,
but after some fooling around I was able to reproduce this error,
and the cause is this: auto_explain hasn't a single clue about
parallel query.
1. In the parent process, we have a parallelizable hash join being
executed in a statement inside a function. Since
auto_explain.log_nested_statements is not enabled, auto_explain
does not deem that it should trace the statement, so the query
starts up with estate->es_instrument = 0, and therefore
ExecHashInitializeDSM chooses not to create any shared
SharedHashInfo area.
2. In the worker processes, auto_explain manages to grab execution
control when ParallelQueryMain calls ExecutorStart, thanks to being
in ExecutorStart_hook. Having no clue what's going on, it decides
that this is a new top-level query that it should trace, and it
sets some bits in queryDesc->instrument_options.
3. When the workers get to ExecHashInitializeWorker, they see that
instrumentation is active so they try to look up the SharedHashInfo.
Kaboom.
I'm inclined to think that explain_ExecutorStart should simply
keep its hands off of everything when in a parallel worker;
if instrumentation is required, that'll be indicated by options
passed down from the parent process. It looks like this could
conveniently be merged with the rate-sampling logic by forcing
current_query_sampled to false when IsParallelWorker().
Likely this should be back-patched all the way to 9.6. I'm
not sure how we managed to avoid noticing it before now,
but there are probably ways to cause visible trouble in
any release that has any parallel query support.
regards, tom lane