Обсуждение: pg11.1 jit segv

Поиск
Список
Период
Сортировка

pg11.1 jit segv

От
Justin Pryzby
Дата:
Crash is reproducible but only when JIT=on.

postgresql11-llvmjit-11.1-1PGDG.rhel7.x86_64

[2769871.453033] postmaster[8582]: segfault at 7f083bddb780 ip 00007f08127e814e sp 00007ffe463506e0 error 4
[2770774.470600] postmaster[29410]: segfault at 7f0812eeb6c8 ip 00007f08127eb4f0 sp 00007ffe463506e0 error 4

Core was generated by `postgres: telsasoft ts 192.168.122.11(41908) SELECT    '.
Program terminated with signal 11, Segmentation fault.

(gdb) bt
#0  0x00007f08127e814e in ?? ()
#1  0x0000000000000000 in ?? ()

[pryzbyj@database ~]$ sudo -u postgres valgrind /usr/pgsql-11/bin/postgres --single -D /var/lib/pgsql/11/data ts
<tmp/sql-jit-crash-2018-11-15
 

==26448== Conditional jump or move depends on uninitialised value(s)
==26448==    at 0x1B510F09: getAdjustedPtr(llvm::IRBuilder<llvm::ConstantFolder, (anonymous
namespace)::IRBuilderPrefixedInserter>&,llvm::DataLayout const&, llvm::Value*, llvm::APInt, llvm::Type*, llvm::Twine)
(SROA.cpp:1531)
==26448==    by 0x1B511C52: llvm::sroa::AllocaSliceRewriter::getNewAllocaSlicePtr(llvm::IRBuilder<llvm::ConstantFolder,
(anonymousnamespace)::IRBuilderPrefixedInserter>&, llvm::Type*) (SROA.cpp:2313)
 
==26448==    by 0x1B516BA0: llvm::sroa::AllocaSliceRewriter::visitIntrinsicInst(llvm::IntrinsicInst&) (SROA.cpp:2921)
==26448==    by 0x1B5190CC: visitCall (Instruction.def:190)
==26448==    by 0x1B5190CC: llvm::InstVisitor<llvm::sroa::AllocaSliceRewriter, bool>::visit(llvm::Instruction&)
(Instruction.def:190)
==26448==    by 0x1B51947A: visit (InstVisitor.h:114)
==26448==    by 0x1B51947A: llvm::sroa::AllocaSliceRewriter::visit((anonymous namespace)::Slice const*)
(SROA.cpp:2262)
==26448==    by 0x1B51D426: llvm::SROA::rewritePartition(llvm::AllocaInst&, llvm::sroa::AllocaSlices&,
llvm::sroa::Partition&)(SROA.cpp:3921)
 
==26448==    by 0x1B51E630: llvm::SROA::splitAlloca(llvm::AllocaInst&, llvm::sroa::AllocaSlices&) (SROA.cpp:4029)
==26448==    by 0x1B51F25D: llvm::SROA::runOnAlloca(llvm::AllocaInst&) (SROA.cpp:4156)
==26448==    by 0x1B52048A: llvm::SROA::runImpl(llvm::Function&, llvm::DominatorTree&, llvm::AssumptionCache&)
(SROA.cpp:4243)
==26448==    by 0x1B520E40: llvm::sroa::SROALegacyPass::runOnFunction(llvm::Function&) (SROA.cpp:4296)
==26448==    by 0x1AC49C31: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1514)
==26448==    by 0x1B6A805E: RunPassOnSCC (Timer.h:149)
==26448==    by 0x1B6A805E: RunAllPassesOnSCC (CallGraphSCCPass.cpp:419)
==26448==    by 0x1B6A805E: (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&)
(CallGraphSCCPass.cpp:474)
==26448==
==26448== Conditional jump or move depends on uninitialised value(s)
==26448==    at 0x1B510F09: getAdjustedPtr(llvm::IRBuilder<llvm::ConstantFolder, (anonymous
namespace)::IRBuilderPrefixedInserter>&,llvm::DataLayout const&, llvm::Value*, llvm::APInt, llvm::Type*, llvm::Twine)
(SROA.cpp:1531)
==26448==    by 0x1B511C52: llvm::sroa::AllocaSliceRewriter::getNewAllocaSlicePtr(llvm::IRBuilder<llvm::ConstantFolder,
(anonymousnamespace)::IRBuilderPrefixedInserter>&, llvm::Type*) (SROA.cpp:2313)
 
==26448==    by 0x1B515EF0: llvm::sroa::AllocaSliceRewriter::visitMemSetInst(llvm::MemSetInst&) (SROA.cpp:2656)
==26448==    by 0x1B5190CC: visitCall (Instruction.def:190)
==26448==    by 0x1B5190CC: llvm::InstVisitor<llvm::sroa::AllocaSliceRewriter, bool>::visit(llvm::Instruction&)
(Instruction.def:190)
==26448==    by 0x1B51947A: visit (InstVisitor.h:114)
==26448==    by 0x1B51947A: llvm::sroa::AllocaSliceRewriter::visit((anonymous namespace)::Slice const*)
(SROA.cpp:2262)
==26448==    by 0x1B51D426: llvm::SROA::rewritePartition(llvm::AllocaInst&, llvm::sroa::AllocaSlices&,
llvm::sroa::Partition&)(SROA.cpp:3921)
 
==26448==    by 0x1B51E630: llvm::SROA::splitAlloca(llvm::AllocaInst&, llvm::sroa::AllocaSlices&) (SROA.cpp:4029)
==26448==    by 0x1B51F25D: llvm::SROA::runOnAlloca(llvm::AllocaInst&) (SROA.cpp:4156)
==26448==    by 0x1B52048A: llvm::SROA::runImpl(llvm::Function&, llvm::DominatorTree&, llvm::AssumptionCache&)
(SROA.cpp:4243)
==26448==    by 0x1B520E40: llvm::sroa::SROALegacyPass::runOnFunction(llvm::Function&) (SROA.cpp:4296)
==26448==    by 0x1AC49C31: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1514)
==26448==    by 0x1B6A805E: RunPassOnSCC (Timer.h:149)
==26448==    by 0x1B6A805E: RunAllPassesOnSCC (CallGraphSCCPass.cpp:419)
==26448==    by 0x1B6A805E: (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&)
(CallGraphSCCPass.cpp:474)
==26448==

==26448== Invalid write of size 8
==26448==    at 0x4C2E0C3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
==26448==    by 0x824075: UnknownInlinedFun (string3.h:51)
==26448==    by 0x824075: varstrfastcmp_locale (varlena.c:2135)
==26448==    by 0x87C6F3: ApplySortComparator (sortsupport.h:224)
==26448==    by 0x87C6F3: comparetup_heap (tuplesort.c:3560)
==26448==    by 0x8793F4: qsort_tuple (qsort_tuple.c:112)
==26448==    by 0x87EE53: tuplesort_performsort (tuplesort.c:1811)
==26448==    by 0x628CE2: ExecSort (nodeSort.c:118)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==  Address 0x1f77c1c0 is 0 bytes after a block of size 8,192 alloc'd
==26448==    at 0x4C29C23: malloc (vg_replace_malloc.c:299)
==26448==    by 0x86D5AD: AllocSetContextCreateExtended (aset.c:477)
==26448==    by 0x87AA3D: tuplesort_begin_common (tuplesort.c:697)
==26448==    by 0x87DAA9: tuplesort_begin_heap (tuplesort.c:812)
==26448==    by 0x628C93: ExecSort (nodeSort.c:89)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x61B364: UnknownInlinedFun (executor.h:237)
==26448==    by 0x61B364: MultiExecPrivateHash (nodeHash.c:164)
==26448==    by 0x61B364: MultiExecHash (nodeHash.c:114)
==26448==
==26448== Invalid read of size 8
==26448==    at 0x4C2E0CE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
==26448==    by 0x824075: UnknownInlinedFun (string3.h:51)
==26448==    by 0x824075: varstrfastcmp_locale (varlena.c:2135)
==26448==    by 0x87C6F3: ApplySortComparator (sortsupport.h:224)
==26448==    by 0x87C6F3: comparetup_heap (tuplesort.c:3560)
==26448==    by 0x8793F4: qsort_tuple (qsort_tuple.c:112)
==26448==    by 0x87EE53: tuplesort_performsort (tuplesort.c:1811)
==26448==    by 0x628CE2: ExecSort (nodeSort.c:118)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==  Address 0x1f77e200 is 0 bytes after a block of size 8,192 alloc'd
==26448==    at 0x4C29C23: malloc (vg_replace_malloc.c:299)
==26448==    by 0x86D5AD: AllocSetContextCreateExtended (aset.c:477)
==26448==    by 0x87AA5A: tuplesort_begin_common (tuplesort.c:710)
==26448==    by 0x87DAA9: tuplesort_begin_heap (tuplesort.c:812)
==26448==    by 0x628C93: ExecSort (nodeSort.c:89)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x61B364: UnknownInlinedFun (executor.h:237)
==26448==    by 0x61B364: MultiExecPrivateHash (nodeHash.c:164)
==26448==    by 0x61B364: MultiExecHash (nodeHash.c:114)
==26448==
==26448== Invalid read of size 8
==26448==    at 0x4C2E0C0: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
==26448==    by 0x824075: UnknownInlinedFun (string3.h:51)
==26448==    by 0x824075: varstrfastcmp_locale (varlena.c:2135)
==26448==    by 0x87C6F3: ApplySortComparator (sortsupport.h:224)
==26448==    by 0x87C6F3: comparetup_heap (tuplesort.c:3560)
==26448==    by 0x8793F4: qsort_tuple (qsort_tuple.c:112)
==26448==    by 0x87EE53: tuplesort_performsort (tuplesort.c:1811)
==26448==    by 0x628CE2: ExecSort (nodeSort.c:118)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==  Address 0x1f77e210 is 16 bytes after a block of size 8,192 alloc'd
==26448==    at 0x4C29C23: malloc (vg_replace_malloc.c:299)
==26448==    by 0x86D5AD: AllocSetContextCreateExtended (aset.c:477)
==26448==    by 0x87AA5A: tuplesort_begin_common (tuplesort.c:710)
==26448==    by 0x87DAA9: tuplesort_begin_heap (tuplesort.c:812)
==26448==    by 0x628C93: ExecSort (nodeSort.c:89)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x61B364: UnknownInlinedFun (executor.h:237)
==26448==    by 0x61B364: MultiExecPrivateHash (nodeHash.c:164)
==26448==    by 0x61B364: MultiExecHash (nodeHash.c:114)
==26448==
==26448==
==26448== More than 10000000 total errors detected.  I'm not reporting any more.
==26448== Final error counts will be inaccurate.  Go fix your program!
==26448== Rerun with --error-limit=no to disable this cutoff.  Note
==26448== that errors may occur in your program without prior warning from
==26448== Valgrind, because errors are no longer being displayed.
==26448==
==26448==
==26448== Process terminating with default action of signal 11 (SIGSEGV)
==26448==  Access not within mapped region at address 0x23337000
==26448==    at 0x4C2E0CE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
==26448==    by 0x824075: UnknownInlinedFun (string3.h:51)
==26448==    by 0x824075: varstrfastcmp_locale (varlena.c:2135)
==26448==    by 0x87C6F3: ApplySortComparator (sortsupport.h:224)
==26448==    by 0x87C6F3: comparetup_heap (tuplesort.c:3560)
==26448==    by 0x8793F4: qsort_tuple (qsort_tuple.c:112)
==26448==    by 0x87EE53: tuplesort_performsort (tuplesort.c:1811)
==26448==    by 0x628CE2: ExecSort (nodeSort.c:118)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==    by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==    by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==    by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==    by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==    by 0x60A764: ExecScan (execScan.c:162)
==26448==    by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==  If you believe this happened as a result of a stack
==26448==  overflow in your program's main thread (unlikely but
==26448==  possible), you can try to increase the size of the
==26448==  main thread stack using the --main-stacksize= flag.
==26448==  The main thread stack size used in this run was 8388608.
=
Let me know if there's anything else I can provide.

Justin


Re: pg11.1 jit segv

От
Andres Freund
Дата:
Hi,

On 2018-11-15 16:39:59 -0600, Justin Pryzby wrote:
> Crash is reproducible but only when JIT=on.
> 
> postgresql11-llvmjit-11.1-1PGDG.rhel7.x86_64
> 
> [2769871.453033] postmaster[8582]: segfault at 7f083bddb780 ip 00007f08127e814e sp 00007ffe463506e0 error 4
> [2770774.470600] postmaster[29410]: segfault at 7f0812eeb6c8 ip 00007f08127eb4f0 sp 00007ffe463506e0 error 4
> 
> Core was generated by `postgres: telsasoft ts 192.168.122.11(41908) SELECT    '.
> Program terminated with signal 11, Segmentation fault.

Thanks for reporting!


> (gdb) bt
> #0  0x00007f08127e814e in ?? ()
> #1  0x0000000000000000 in ?? ()

Could you enable jit_debugging_support and reproduce?  That should give
a more useful backtrace.

It'd also be interesting if toggling jit_inline_above_cost=-1,
jit_optimize_above_cost=-1 and jit_tuple_deforming=false, each set
individually, make a difference.

> Let me know if there's anything else I can provide.

Any way for me to reproduce this locally? I.e. can you narrow this down
to a reproducible test case?

Greetings,

Andres Freund


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Thu, Nov 15, 2018 at 02:47:55PM -0800, Andres Freund wrote:
> > (gdb) bt
> > #0  0x00007f08127e814e in ?? ()
> > #1  0x0000000000000000 in ?? ()
> 
> Could you enable jit_debugging_support and reproduce?  That should give
> a more useful backtrace.

Core was generated by `postgres: pryzbyj ts [local] EXPLAIN                   '.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.1-3.el7_5.1.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 glib4
 
(gdb) bt
#0  0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x0000000000824076 in memcpy (__len=18446744073709551612, __src=0x6c35818, __dest=<optimized out>) at
/usr/include/bits/string3.h:51
#2  varstrfastcmp_locale (x=113465364, y=113465636, ssup=<optimized out>) at varlena.c:2135
#3  0x000000000087c6f4 in ApplySortComparator (ssup=0x6c33c40, isNull2=<optimized out>, datum2=<optimized out>,
isNull1=false,datum1=<optimized out>) at ../../../../src/include/utils/sortsupport.h:224
 
#4  comparetup_heap (a=<optimized out>, b=<optimized out>, state=0x6c337f0) at tuplesort.c:3560
#5  0x00000000008793f5 in qsort_tuple (a=0x8311958, n=5272, cmp_tuple=0x87c2d0 <comparetup_heap>,
state=state@entry=0x6c337f0)at qsort_tuple.c:112
 
#6  0x000000000087d2db in tuplesort_sort_memtuples (state=state@entry=0x6c337f0) at tuplesort.c:3320
#7  0x000000000087ee54 in tuplesort_performsort (state=state@entry=0x6c337f0) at tuplesort.c:1811
#8  0x0000000000628ce3 in ExecSort (pstate=0x5e50ed8) at nodeSort.c:118
#9  0x0000000000609068 in ExecProcNodeInstr (node=0x5e50ed8) at execProcnode.c:461
#10 0x0000000000610d49 in ExecProcNode (node=0x5e50ed8) at ../../../src/include/executor/executor.h:237
#11 fetch_input_tuple (aggstate=aggstate@entry=0x5e50aa0) at nodeAgg.c:406
#12 0x0000000000612780 in agg_retrieve_direct (aggstate=0x5e50aa0) at nodeAgg.c:1736
#13 ExecAgg (pstate=0x5e50aa0) at nodeAgg.c:1551
#14 0x0000000000609068 in ExecProcNodeInstr (node=0x5e50aa0) at execProcnode.c:461
#15 0x000000000060a765 in ExecScanFetch (recheckMtd=0x62be50 <SubqueryRecheck>, accessMtd=0x62be70 <SubqueryNext>,
node=0x5e508e0)at execScan.c:95
 
#16 ExecScan (node=0x5e508e0, accessMtd=0x62be70 <SubqueryNext>, recheckMtd=0x62be50 <SubqueryRecheck>) at
execScan.c:162
#17 0x0000000000609068 in ExecProcNodeInstr (node=0x5e508e0) at execProcnode.c:461
#18 0x000000000061b365 in ExecProcNode (node=0x5e508e0) at ../../../src/include/executor/executor.h:237
#19 MultiExecPrivateHash (node=0x5e50720) at nodeHash.c:164
#20 MultiExecHash (node=node@entry=0x5e50720) at nodeHash.c:114
#21 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e50720) at execProcnode.c:501
#22 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x5e4f620) at nodeHashjoin.c:290
#23 ExecHashJoin (pstate=0x5e4f620) at nodeHashjoin.c:565
#24 0x0000000000609068 in ExecProcNodeInstr (node=0x5e4f620) at execProcnode.c:461
#25 0x000000000061b365 in ExecProcNode (node=0x5e4f620) at ../../../src/include/executor/executor.h:237
#26 MultiExecPrivateHash (node=0x5e4f460) at nodeHash.c:164
#27 MultiExecHash (node=node@entry=0x5e4f460) at nodeHash.c:114
#28 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e4f460) at execProcnode.c:501
#29 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x38344f8) at nodeHashjoin.c:290
#30 ExecHashJoin (pstate=0x38344f8) at nodeHashjoin.c:565
#31 0x0000000000609068 in ExecProcNodeInstr (node=0x38344f8) at execProcnode.c:461
#32 0x000000000061b365 in ExecProcNode (node=0x38344f8) at ../../../src/include/executor/executor.h:237
#33 MultiExecPrivateHash (node=0x38343c8) at nodeHash.c:164
#34 MultiExecHash (node=node@entry=0x38343c8) at nodeHash.c:114
#35 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x38343c8) at execProcnode.c:501
#36 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x3833148) at nodeHashjoin.c:290
#37 ExecHashJoin (pstate=0x3833148) at nodeHashjoin.c:565
#38 0x0000000000609068 in ExecProcNodeInstr (node=0x3833148) at execProcnode.c:461
#39 0x0000000000628cd6 in ExecProcNode (node=0x3833148) at ../../../src/include/executor/executor.h:237
#40 ExecSort (pstate=0x3832bd0) at nodeSort.c:107
#41 0x0000000000609068 in ExecProcNodeInstr (node=0x3832bd0) at execProcnode.c:461
#42 0x0000000000610d49 in ExecProcNode (node=0x3832bd0) at ../../../src/include/executor/executor.h:237
#43 fetch_input_tuple (aggstate=aggstate@entry=0x3832e30) at nodeAgg.c:406
#44 0x0000000000612780 in agg_retrieve_direct (aggstate=0x3832e30) at nodeAgg.c:1736
#45 ExecAgg (pstate=0x3832e30) at nodeAgg.c:1551
#46 0x0000000000609068 in ExecProcNodeInstr (node=0x3832e30) at execProcnode.c:461
#47 0x00000000006025ba in ExecProcNode (node=0x3832e30) at ../../../src/include/executor/executor.h:237
#48 ExecutePlan (execute_once=<optimized out>, dest=0xced3a0 <donothingDR>, direction=<optimized out>, numberTuples=0,
sendTuples=true,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x3832e30,
 
    estate=0x3832970) at execMain.c:1723
#49 standard_ExecutorRun (queryDesc=0x5c3ec68, direction=<optimized out>, count=0, execute_once=<optimized out>) at
execMain.c:364
#50 0x00007f8191ade08d in pgss_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0,
execute_once=<optimizedout>) at pg_stat_statements.c:892
 
#51 0x00007f80eb8bf5e1 in explain_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0,
execute_once=<optimizedout>) at auto_explain.c:268
 
#52 0x00000000005a497c in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x5c32cd8, into=into@entry=0x0,
es=es@entry=0x207aaf8,
    queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs,
sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5
 
#53 0x00000000005a4c87 in ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0,
es=0x207aaf8,
    queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' ('
||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as1
 
#54 0x00000000005a5174 in ExplainQuery (pstate=pstate@entry=0x2078dc8, stmt=stmt@entry=0x1fe1df8,
    queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs,
sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u4
 
#55 0x00000000007469dd in standard_ProcessUtility (pstmt=pstmt@entry=0x1fe1ea8,
    queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs,
sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5
 
#56 0x00007f8191ae0025 in pgss_ProcessUtility (pstmt=0x1fe1ea8,
    queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' ('
||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as5
 
#57 0x0000000000743e86 in PortalRunUtility (portal=0x1f12be0, pstmt=0x1fe1ea8, isTopLevel=<optimized out>,
setHoldSnapshot=<optimizedout>, dest=0x2078d38, completionTag=0x7ffc304d0e20 "") at pquery.c:1178
 
#58 0x0000000000744bc2 in FillPortalStore (portal=portal@entry=0x1f12be0, isTopLevel=isTopLevel@entry=true) at
pquery.c:1038
#59 0x000000000074564f in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimizedout>, dest=0x7f81a09e2dc8, altdest=0x7f81a09e2dc8, completionTag=0x7ffc304d1040 "")
 
    at pquery.c:768
#60 0x0000000000741577 in exec_simple_query (query_string=<optimized out>) at postgres.c:1145
#61 0x0000000000742842 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>,
username=<optimizedout>) at postgres.c:4182 
#62 0x00000000004801dd in BackendRun (port=0x1ec3f70) at postmaster.c:4361
#63 BackendStartup (port=0x1ec3f70) at postmaster.c:4033
#64 ServerLoop () at postmaster.c:1706
#65 0x00000000006d5d49 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1e98280) at postmaster.c:1379
#66 0x000000000048103f in main (argc=3, argv=0x1e98280) at main.c:228
(

(gdb) bt f
#0  0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000824076 in memcpy (__len=18446744073709551612, __src=0x6c35818, __dest=<optimized out>) at
/usr/include/bits/string3.h:51
No locals.
#2  varstrfastcmp_locale (x=113465364, y=113465636, ssup=<optimized out>) at varlena.c:2135
        arg1 = 0x6c35814
        arg2 = 0x6c35924
        arg1_match = false
        sss = 0x6c33d60
        a1p = 0x6c35818 ""
        a2p = 0x6c35928 ""
        len1 = -4
        len2 = -4
        result = <optimized out>
#3  0x000000000087c6f4 in ApplySortComparator (ssup=0x6c33c40, isNull2=<optimized out>, datum2=<optimized out>,
isNull1=false,datum1=<optimized out>) at ../../../../src/include/utils/sortsupport.h:224
 
        compare = <optimized out>
        isNull2 = false
        isNull1 = false
        ssup = 0x6c33c40
        datum2 = <optimized out>
        compare = <optimized out>
#4  comparetup_heap (a=<optimized out>, b=<optimized out>, state=0x6c337f0) at tuplesort.c:3560
        sortKey = 0x6c33c40
        ltup = {t_len = 248, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 42884}, t_tableOid = 0, t_data =
0x6c357f8}
        rtup = {t_len = 248, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, t_tableOid = 0, t_data =
0x6c35908}
        tupDesc = 0x5ede208
        nkey = 1
        attno = 2
        datum1 = <optimized out>
        datum2 = <optimized out>
        isnull1 = false
        isnull2 = false
#5  0x00000000008793f5 in qsort_tuple (a=0x8311958, n=5272, cmp_tuple=0x87c2d0 <comparetup_heap>,
state=state@entry=0x6c337f0)at qsort_tuple.c:112
 
        pa = <optimized out>
        pb = <optimized out>
        pc = <optimized out>
        pd = <optimized out>
        pl = <optimized out>
        pm = 0x8311970
        pn = <optimized out>
        d1 = <optimized out>
        d2 = <optimized out>
        r = <optimized out>
        presorted = 1
#6  0x000000000087d2db in tuplesort_sort_memtuples (state=state@entry=0x6c337f0) at tuplesort.c:3320
No locals.
#7  0x000000000087ee54 in tuplesort_performsort (state=state@entry=0x6c337f0) at tuplesort.c:1811
        oldcontext = 0x3832860
        __func__ = "tuplesort_performsort"
#8  0x0000000000628ce3 in ExecSort (pstate=0x5e50ed8) at nodeSort.c:118
        plannode = <optimized out>
        outerNode = 0x5e50fe8
        tupDesc = <optimized out>
        node = 0x5e50ed8
        estate = 0x3832970
        dir = ForwardScanDirection
        tuplesortstate = 0x6c337f0
        slot = <optimized out>
#9  0x0000000000609068 in ExecProcNodeInstr (node=0x5e50ed8) at execProcnode.c:461
        result = <optimized out>
#10 0x0000000000610d49 in ExecProcNode (node=0x5e50ed8) at ../../../src/include/executor/executor.h:237
No locals.
#11 fetch_input_tuple (aggstate=aggstate@entry=0x5e50aa0) at nodeAgg.c:406
        slot = <optimized out>
#12 0x0000000000612780 in agg_retrieve_direct (aggstate=0x5e50aa0) at nodeAgg.c:1736
        econtext = 0x5e50e28
        firstSlot = 0x674fa50
        numGroupingSets = 1
        node = 0x5c03788
        tmpcontext = 0x5e50cc8
        peragg = 0x6754918
        outerslot = <optimized out>
        nextSetSize = <optimized out>
        pergroups = 0x6755928
        result = <optimized out>
        hasGroupingSets = false
        currentSet = <optimized out>
        numReset = 1
        i = <optimized out>
#13 ExecAgg (pstate=0x5e50aa0) at nodeAgg.c:1551
        node = 0x5e50aa0
        result = 0x0
#14 0x0000000000609068 in ExecProcNodeInstr (node=0x5e50aa0) at execProcnode.c:461
        result = <optimized out>
#15 0x000000000060a765 in ExecScanFetch (recheckMtd=0x62be50 <SubqueryRecheck>, accessMtd=0x62be70 <SubqueryNext>,
node=0x5e508e0)at execScan.c:95
 
        estate = 0x3832970
#16 ExecScan (node=0x5e508e0, accessMtd=0x62be70 <SubqueryNext>, recheckMtd=0x62be50 <SubqueryRecheck>) at
execScan.c:162
        econtext = 0x5e509f0
        qual = 0x0
        projInfo = 0x675dbb8
#17 0x0000000000609068 in ExecProcNodeInstr (node=0x5e508e0) at execProcnode.c:461
        result = <optimized out>
#18 0x000000000061b365 in ExecProcNode (node=0x5e508e0) at ../../../src/include/executor/executor.h:237
No locals.
#19 MultiExecPrivateHash (node=0x5e50720) at nodeHash.c:164
        outerNode = <optimized out>
        hashkeys = 0x67662f0
        econtext = 0x5e50830
        hashvalue = 0
        hashtable = 0x6a7d468
#20 MultiExecHash (node=node@entry=0x5e50720) at nodeHash.c:114
No locals.
#21 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e50720) at execProcnode.c:501
        __func__ = "MultiExecProcNode"
#22 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x5e4f620) at nodeHashjoin.c:290
        outerNode = 0x5e4f8e0
        hashNode = <optimized out>
        econtext = <optimized out>
        node = 0x5e4f620
        joinqual = <optimized out>
        otherqual = <optimized out>
        hashtable = 0x6a7d468
        hashvalue = 0
        batchno = 48
        parallel_state = <optimized out>
#23 ExecHashJoin (pstate=0x5e4f620) at nodeHashjoin.c:565
No locals.
#24 0x0000000000609068 in ExecProcNodeInstr (node=0x5e4f620) at execProcnode.c:461
        result = <optimized out>
#25 0x000000000061b365 in ExecProcNode (node=0x5e4f620) at ../../../src/include/executor/executor.h:237
No locals.
#26 MultiExecPrivateHash (node=0x5e4f460) at nodeHash.c:164
        outerNode = <optimized out>
        hashkeys = 0x676ef80
        econtext = 0x5e4f570
        hashvalue = 0
        hashtable = 0x6a7cf48
#27 MultiExecHash (node=node@entry=0x5e4f460) at nodeHash.c:114
No locals.
#28 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e4f460) at execProcnode.c:501
        __func__ = "MultiExecProcNode"
#29 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x38344f8) at nodeHashjoin.c:290
        outerNode = 0x5c4b4b8
        hashNode = <optimized out>
        econtext = <optimized out>
        node = 0x38344f8
        joinqual = <optimized out>
        otherqual = <optimized out>
        hashtable = 0x6a7cf48
        hashvalue = 0
        batchno = 0
        parallel_state = <optimized out>
#30 ExecHashJoin (pstate=0x38344f8) at nodeHashjoin.c:565
No locals.
#31 0x0000000000609068 in ExecProcNodeInstr (node=0x38344f8) at execProcnode.c:461
        result = <optimized out>
#32 0x000000000061b365 in ExecProcNode (node=0x38344f8) at ../../../src/include/executor/executor.h:237
No locals.
#33 MultiExecPrivateHash (node=0x38343c8) at nodeHash.c:164
        outerNode = <optimized out>
        hashkeys = 0x6778c40
        econtext = 0x3834048
        hashvalue = 0
        hashtable = 0x6a7ce38
#34 MultiExecHash (node=node@entry=0x38343c8) at nodeHash.c:114
No locals.
#35 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x38343c8) at execProcnode.c:501
        __func__ = "MultiExecProcNode"
#36 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x3833148) at nodeHashjoin.c:290
        outerNode = 0x3833408
        hashNode = <optimized out>
        econtext = <optimized out>
        node = 0x3833148
        joinqual = <optimized out>
        otherqual = <optimized out>
        hashtable = 0x6a7ce38
        hashvalue = 0
        batchno = 96457208
        parallel_state = <optimized out>
#37 ExecHashJoin (pstate=0x3833148) at nodeHashjoin.c:565
No locals.
#38 0x0000000000609068 in ExecProcNodeInstr (node=0x3833148) at execProcnode.c:461
        result = <optimized out>
#39 0x0000000000628cd6 in ExecProcNode (node=0x3833148) at ../../../src/include/executor/executor.h:237
No locals.
#40 ExecSort (pstate=0x3832bd0) at nodeSort.c:107
        plannode = <optimized out>
        outerNode = 0x3833148
        tupDesc = <optimized out>
        node = 0x3832bd0
        estate = 0x3832970
        dir = ForwardScanDirection
        tuplesortstate = 0x6c0f6f0
        slot = <optimized out>
#41 0x0000000000609068 in ExecProcNodeInstr (node=0x3832bd0) at execProcnode.c:461
        result = <optimized out>
#42 0x0000000000610d49 in ExecProcNode (node=0x3832bd0) at ../../../src/include/executor/executor.h:237
No locals.
#43 fetch_input_tuple (aggstate=aggstate@entry=0x3832e30) at nodeAgg.c:406
        slot = <optimized out>
#44 0x0000000000612780 in agg_retrieve_direct (aggstate=0x3832e30) at nodeAgg.c:1736
        econtext = 0x3833098
        firstSlot = 0x677a550
        numGroupingSets = 1
        node = 0x5c06950
        tmpcontext = 0x3832ce0
        peragg = 0x67873d0
        outerslot = <optimized out>
        nextSetSize = <optimized out>
        pergroups = 0x6a752e8
        result = <optimized out>
        hasGroupingSets = false
        currentSet = <optimized out>
        numReset = 1
        i = <optimized out>
#45 ExecAgg (pstate=0x3832e30) at nodeAgg.c:1551
        node = 0x3832e30
        result = 0x0
#46 0x0000000000609068 in ExecProcNodeInstr (node=0x3832e30) at execProcnode.c:461
        result = <optimized out>
#47 0x00000000006025ba in ExecProcNode (node=0x3832e30) at ../../../src/include/executor/executor.h:237
No locals.
#48 ExecutePlan (execute_once=<optimized out>, dest=0xced3a0 <donothingDR>, direction=<optimized out>, numberTuples=0,
sendTuples=true,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x3832e30,
 
    estate=0x3832970) at execMain.c:1723
        slot = <optimized out>
        current_tuple_count = 0
#49 standard_ExecutorRun (queryDesc=0x5c3ec68, direction=<optimized out>, count=0, execute_once=<optimized out>) at
execMain.c:364
        estate = 0x3832970
        operation = CMD_SELECT
        dest = 0xced3a0 <donothingDR>
        sendTuples = <optimized out>
        oldcontext = 0x2078b00
        __func__ = "standard_ExecutorRun"
#50 0x00007f8191ade08d in pgss_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0,
execute_once=<optimizedout>) at pg_stat_statements.c:892
 
        save_exception_stack = 0x7ffc304d0890
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {34056952, -4317634825662262871, 140721118841376, 96726120, 0, 0,
4318053775260225961,4258026755233780137}, __mask_was_saved = 0, __saved_mask = {__val = {33258448, 2, 8603255,
 
                33258448, 5392503, 16, 0, 72057594037927936, 336, 1, 58927200, 58927472, 8861374, 7, 1, 58928688}}}}
#51 0x00007f80eb8bf5e1 in explain_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0,
execute_once=<optimizedout>) at auto_explain.c:268
 
        save_exception_stack = 0x7ffc304d0cf0
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {34056952, -4317634825662262871, 140721118841376, 96726120, 0, 0,
4318053775153271209,4257441557765001641}, __mask_was_saved = 0, __saved_mask = {__val = {336, 1, 58927200, 96726120,
 
                8861374, 2147483647, 1, 34048768, 6356030, 96726120, 96726120, 13907488, 140191684360137, 13554592,
810355232,34056952}}}}
 
#52 0x00000000005a497c in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x5c32cd8, into=into@entry=0x0,
es=es@entry=0x207aaf8,
    queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs,
sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5
 
        dir = <optimized out>
        dest = <optimized out>
        queryDesc = 0x5c3ec68
        starttime = {tv_sec = 2774013, tv_nsec = 519809227}
        totaltime = 0
        eflags = <optimized out>
        instrument_option = <optimized out>
#53 0x00000000005a4c87 in ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0,
es=0x207aaf8,
    queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' ('
||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as1
 
        plan = 0x5c32cd8
        planstart = {tv_sec = 2774013, tv_nsec = 111256886}
        planduration = {tv_sec = 0, tv_nsec = 408551920}
        params = 0x0
        queryString = 0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid ||
'(' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.per.
 
        es = 0x207aaf8
        into = 0x0
        queryEnv = 0x0
        cursorOptions = <optimized out>
        query = <optimized out>
#54 0x00000000005a5174 in ExplainQuery (pstate=pstate@entry=0x2078dc8, stmt=stmt@entry=0x1fe1df8,
    queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs,
sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u4
 
        l = 0x20e15a8
        es = 0x207aaf8
        tstate = <optimized out>
        rewritten = <optimized out>
        lc = <optimized out>
        timing_set = <optimized out>
        summary_set = <optimized out>
        __func__ = "ExplainQuery"
#55 0x00000000007469dd in standard_ProcessUtility (pstmt=pstmt@entry=0x1fe1ea8,
    queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs,
sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5
 
        parsetree = 0x1fe1df8
        isTopLevel = true
        isAtomicContext = false
        pstate = 0x2078dc8
        __func__ = "standard_ProcessUtility"
#56 0x00007f8191ae0025 in pgss_ProcessUtility (pstmt=0x1fe1ea8,
    queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' ('
||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as5
 
        save_exception_stack = 0x7ffc304d0f10
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {32582624, -4317636606361366103, 34049336, 32848032, 32582624, 2,
4318053775111328169,4258026755403256233}, __mask_was_saved = 0, __saved_mask = {__val = {0, 0, 8860127,
 
                140721118842400, 16, 32088528, 32847712, 24, 34049336, 140721118842400, 8860127, 32848032, 32582624,
32848032,8926208, 2}}}}
 
        duration = {tv_sec = 33430760, tv_nsec = 33985024}
        bufusage = {shared_blks_hit = -140721118842111, shared_blks_read = 1028, shared_blks_dirtied = 257,
shared_blks_written= 32, local_blks_hit = 528280977713, local_blks_read = 140721118842112, local_blks_dirtied = 4,
 
          local_blks_written = 0, temp_blks_read = 511101108338, temp_blks_written = 0, blk_read_time = {tv_sec =
140721118842111,tv_nsec = 140721118842160}, blk_write_time = {tv_sec = 2, tv_nsec = 0}}
 
        start = {tv_sec = 2774013, tv_nsec = 108551544}
        rows = <optimized out>
        bufusage_start = {shared_blks_hit = 900, shared_blks_read = 0, shared_blks_dirtied = 0, shared_blks_written =
0,local_blks_hit = <optimized out>, local_blks_read = <optimized out>,
 
          local_blks_dirtied = <optimized out>, local_blks_written = <optimized out>, temp_blks_read = <optimized out>,
temp_blks_written= <optimized out>, blk_read_time = {tv_sec = <optimized out>,
 
            tv_nsec = <optimized out>}, blk_write_time = {tv_sec = <optimized out>, tv_nsec = <optimized out>}}
        parsetree = <optimized out>
#57 0x0000000000743e86 in PortalRunUtility (portal=0x1f12be0, pstmt=0x1fe1ea8, isTopLevel=<optimized out>,
setHoldSnapshot=<optimizedout>, dest=0x2078d38, completionTag=0x7ffc304d0e20 "") at pquery.c:1178
 
        utilityStmt = <optimized out>
        snapshot = 0x1f538a0
#58 0x0000000000744bc2 in FillPortalStore (portal=portal@entry=0x1f12be0, isTopLevel=isTopLevel@entry=true) at
pquery.c:1038
        treceiver = 0x2078d38
        completionTag =
"\000\377\377\377\000\000\000\000\260z\372\001\000\000\000\000\360\240\351\001\000\000\000\000\245c\207\000\000\000\000\000\020\214\a\002\000\000\000\000\260z\372\001\000\000\000\000\a\000\000\000\000\"
        __func__ = "FillPortalStore"
#59 0x000000000074564f in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimizedout>, dest=0x7f81a09e2dc8, altdest=0x7f81a09e2dc8, completionTag=0x7ffc304d1040 "")
 
    at pquery.c:768
        save_exception_stack = 0x7ffc304d1220
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {0, -4317636606344588887, 33431128, 140721118842944, 32582624, 2,
4318053774943556009,-4317633223726624343}, __mask_was_saved = 0, __saved_mask = {__val = {480, 10132060, 8746404, 1,
 
                32590832, 10476062, 34048768, 2, 1, 2, 112, 33431080, 32101840, 10233096, 8861374, 2}}}}
        result = <optimized out>
        nprocessed = <optimized out>
        saveTopTransactionResourceOwner = 0x1ed5a08
        saveTopTransactionContext = 0x1f53760
        saveActivePortal = 0x0
        saveResourceOwner = 0x1ed5a08
        savePortalContext = 0x0
        saveMemoryContext = 0x1f53760
        __func__ = "PortalRun"
#60 0x0000000000741577 in exec_simple_query (query_string=<optimized out>) at postgres.c:1145
        dest = <optimized out>
        oldcontext = <optimized out>
        parsetree_list = <optimized out>
        parsetree_item = <optimized out>
        save_log_statement_stats = <optimized out>
        was_logged = <optimized out>
        use_implicit_block = <optimized out>
        msec_str = "\000
\000\000\000\000\000\000?\252c\000\000\000\000\000\247\235\354\001\000\000\000\000p?\354\001\002\000\000"
        __func__ = "exec_simple_query"
#61 0x0000000000742842 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>,
username=<optimizedout>) at postgres.c:4182
 
        firstchar = 32267608
        input_message = {
          data = 0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' ('
||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period a}
 
        local_sigjmp_buf = {{__jmpbuf = {32292536, -4317633050175762007, 32259952, 0, 0, 32260608, 4318053775948091817,
-4317633221742456407},__mask_was_saved = 1, __saved_mask = {__val = {0, 32260608, 140194680307352, 0,
 
                206158430256, 140721118843760, 140721118843552, 32102304, 32080672, 16, 66, 32102304, 32292536, 0,
32292616,32292784}}}}
 
        send_ready_for_query = false
        disable_idle_in_transaction_timeout = <optimized out>
        __func__ = "PostgresMain"
#62 0x00000000004801dd in BackendRun (port=0x1ec3f70) at postmaster.c:4361
        ac = 1
        secs = 595637552
        usecs = 441639
        i = 1
        av = 0x1ecbfe8
        maxac = <optimized out>
#63 BackendStartup (port=0x1ec3f70) at postmaster.c:4033
        bn = 0x1ec4200
        pid = 0
#64 ServerLoop () at postmaster.c:1706
        rmask = {fds_bits = {32, 0 <repeats 15 times>}}
        selres = <optimized out>
        now = <optimized out>
        readmask = {fds_bits = {120, 0 <repeats 15 times>}}
        last_lockfile_recheck_time = <optimized out>
        last_touch_time = 1542322298
        __func__ = "ServerLoop"
#65 0x00000000006d5d49 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1e98280) at postmaster.c:1379
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = true
        i = <optimized out>
        output_config_variable = <optimized out>
        __func__ = "PostmasterMain"
#66 0x000000000048103f in main (argc=3, argv=0x1e98280) at main.c:228
No locals.
(gdb)

> It'd also be interesting if toggling jit_inline_above_cost=-1,
> jit_optimize_above_cost=-1 and jit_tuple_deforming=false, each set
> individually, make a difference.

Crashes with jit_tuple_deforming=true (others off).

Doesn't crash with other combinations:
   Options: Inlining false, Optimization true, Expressions true, Deforming false
   Options: Inlining true, Optimization false, Expressions true, Deforming false

> Any way for me to reproduce this locally? I.e. can you narrow this down
> to a reproducible test case?

Will work on it.

Justin


Re: pg11.1 jit segv

От
Andres Freund
Дата:
Hi,

On 2018-11-15 17:03:35 -0600, Justin Pryzby wrote:
> On Thu, Nov 15, 2018 at 02:47:55PM -0800, Andres Freund wrote:
> > > (gdb) bt
> > > #0  0x00007f08127e814e in ?? ()
> > > #1  0x0000000000000000 in ?? ()
> > 
> > Could you enable jit_debugging_support and reproduce?  That should give
> > a more useful backtrace.

> Core was generated by `postgres: pryzbyj ts [local] EXPLAIN                   '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.1-3.el7_5.1.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 glib4
 
> (gdb) bt
> #0  0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
> ...

Huh, that's the same crash? Because I don't see any evalexpr functions
in the stack, and without those the above bt should have worked...


> > It'd also be interesting if toggling jit_inline_above_cost=-1,
> > jit_optimize_above_cost=-1 and jit_tuple_deforming=false, each set
> > individually, make a difference.
> 
> Crashes with jit_tuple_deforming=true (others off).

> Doesn't crash with other combinations:
>    Options: Inlining false, Optimization true, Expressions true, Deforming false
>    Options: Inlining true, Optimization false, Expressions true, Deforming false

Oh, interesting. That helps.  Could you perhaps show an EXPLAIN VERBOSE
of the query plan?


> > Any way for me to reproduce this locally? I.e. can you narrow this down
> > to a reproducible test case?
> 
> Will work on it.

Thanks!

Greetings,

Andres Freund


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Thu, Nov 15, 2018 at 03:14:01PM -0800, Andres Freund wrote:
> Huh, that's the same crash? Because I don't see any evalexpr functions
> in the stack, and without those the above bt should have worked...

TTBOMK it's the same ..

Is it odd if i'm seeing this: (odd because of "ANOTHER") ?
I guess that's maybe because it's running parallel query ?
psql:tmp/sql-jit-crash-2018-11-15.jtp:16: WARNING:  terminating connection because of crash of another server process

I'm started trying to minimize the query.  Here's what's left:

SELECT * FROM
daily_eric_umts_rnc_utrancell_view t2
JOIN (SELECT * FROM eric_umts_rbs_sector_carrier_view t1 WHERE t1.start_time >= '2018-07-01 00:00:00' AND t1.start_time
<'2018-07-09 00:00:00') AS t1 
USING(start_time)
WHERE (t2.start_time >= '2018-07-01 00:00:00' AND t2.start_time < '2018-07-09 00:00:00')

Verbose plan, munged for brevity/sanity due to joining wide tables, and
redacted since the view probably has to be considered proprietary.  Hopefully
the remaining bits are still useful.  I replaced column names with x.

 Gather  (cost=481961.60..482102.41 rows=1180 width=8068)
   Output: [ there are 1700+ columns here... ]
   Workers Planned: 3
   ->  Merge Join  (cost=480961.60..480984.41 rows=381 width=8068)
         Output:  ARRAY[x, x, ..., COALESCE(x, 0), COALESCE(x, 0)], ARRAY[..., COALESCE(x, 0), COALESCE(x, 0),
COALESCE(x,0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COA 
LESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0),
COALESCE(x,0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALE 
SCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x,
0),COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESC 
E(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x,
0),COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0)], ((x)::numeric / 10.0), CASE WHEN (x
ISNULL) THEN ARRAY[..., eric_umts_rbs_sector_carrier_201807.pmaveragerssi_064] ELSE ARRAY[...] END, CASE WHEN (x IS
NULL)THEN
'{-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-60.0}'::numeric[]
ELSE
'{-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-39.5}'::numeric[]
END,ARRAY[x], x, x 
         Merge Cond: (eric_umts_rbs_sector_carrier_201807.start_time = t2.start_time)
         ->  Sort  (cost=480421.29..480428.89 rows=3039 width=1652)
               Output: [...]
               Sort Key: eric_umts_rbs_sector_carrier_201807.start_time
               ->  Parallel Append  (cost=0.00..480245.49 rows=3039 width=1652)
                     ->  Parallel Seq Scan on child.eric_umts_rbs_sector_carrier_201807  (cost=0.00..480230.30
rows=3039width=1652) 
                           Output: [...]
                           Filter: ((eric_umts_rbs_sector_carrier_201807.start_time >= '2018-07-01
00:00:00-07'::timestampwith time zone) AND (eric_umts_rbs_sector_carrier_201807.start_time < '2018-07-09
00:00:00-07'::timesta
mp with time zone))
         ->  Sort  (cost=540.31..540.65 rows=139 width=7760)
               Output: [...]
               Sort Key: t2.start_time
               ->  Append  (cost=0.00..535.36 rows=139 width=7760)
                     ->  Seq Scan on public.daily_eric_umts_rnc_utrancell_view t2  (cost=0.00..0.00 rows=1 width=7760)
                           Output: [...]
                           Filter: ((t2.start_time >= '2018-07-01 00:00:00'::timestamp without time zone) AND
(t2.start_time< '2018-07-09 00:00:00'::timestamp without time zone)) 
                     ->  Bitmap Heap Scan on child.daily_eric_umts_rnc_utrancell_view_201807 t2_1  (cost=9.83..534.66
rows=138width=7760) 
                           Output: [...]
                           Recheck Cond: ((t2_1.start_time >= '2018-07-01 00:00:00'::timestamp without time zone) AND
(t2_1.start_time< '2018-07-09 00:00:00'::timestamp without time zone))  
                           ->  Bitmap Index Scan on daily_eric_umts_rnc_utrancell_view_201807_unique_idx
(cost=0.00..9.79rows=138 width=0) 
                                 Index Cond: ((t2_1.start_time >= '2018-07-01 00:00:00'::timestamp without time zone)
AND(t2_1.start_time < '2018-07-09 00:00:00'::timestamp without time zone)) 
 JIT:
   Functions: 19
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(28 rows)

Justin


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Thu, Nov 15, 2018 at 06:03:34PM -0600, Justin Pryzby wrote:
> Verbose plan, munged for brevity/sanity due to joining wide tables, and
> redacted since the view probably has to be considered proprietary.  Hopefully
> the remaining bits are still useful.  I replaced column names with x.

Actually the view isn't as intricate as I thought, but I'd like to avoid
publishing it for sake of simplicity.  I replaced the view with its underlying
table and now I get:

[pryzbyj@database ~]$ time psql ts -f tmp/sql-jit-crash-2018-11-15.jtp 
psql:tmp/sql-jit-crash-2018-11-15.jtp:12: ERROR:  invalid memory alloc request size 2447889552

It's unclear if that's a useful hint, a separate problem, or a nonissue..

Justin


Re: pg11.1 jit segv

От
Andres Freund
Дата:
Hi,

On 2018-11-15 18:11:05 -0600, Justin Pryzby wrote:
> On Thu, Nov 15, 2018 at 06:03:34PM -0600, Justin Pryzby wrote:
> > Verbose plan, munged for brevity/sanity due to joining wide tables, and
> > redacted since the view probably has to be considered proprietary.  Hopefully
> > the remaining bits are still useful.  I replaced column names with x.
> 
> Actually the view isn't as intricate as I thought, but I'd like to avoid
> publishing it for sake of simplicity.  I replaced the view with its underlying
> table and now I get:
> 
> [pryzbyj@database ~]$ time psql ts -f tmp/sql-jit-crash-2018-11-15.jtp 
> psql:tmp/sql-jit-crash-2018-11-15.jtp:12: ERROR:  invalid memory alloc request size 2447889552
> 
> It's unclear if that's a useful hint, a separate problem, or a nonissue..

It's probably the same / closely related issue. ISTM that JITed
deforming either doesn't deform correctly, or skips deforming
erroneously.

I'm about to commit some changes to 12/master that'd possibly make it
easier to find issues like this. Is there any chance that it's easier to
repro this on master than making a reproducible test case?

Greetings,

Andres Freund


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> I'm about to commit some changes to 12/master that'd possibly make it
> easier to find issues like this. Is there any chance that it's easier to
> repro this on master than making a reproducible test case?

Yes, very possibly - this is on a customer's server, so their data, plus views,
etc are potentially involved until excluded.

Justin


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> I'm about to commit some changes to 12/master that'd possibly make it
> easier to find issues like this.

Are you referring to this  a future commit ?
commit 763f2edd92095b1ca2f4476da073a28505c13820
Rejigger materializing and fetching a HeapTuple from a slot.

I was able to reproduce under HEAD with pg_restored data.

I guess you're right that the "memory alloc failure" is related/same thing,
I've seen it intermittently with queries which also sometimes crash (and also
sometimes don't).

Note that when it crashes, it seems to take a longer time to do so than the
query would normally take.  Like we're walking off the end of an array, say.

I've been able to reproduce the crash with a self join of a table (no view, no
expressions, no parallel, directly querying a relkind='r' child).  In that
case, enable_bitmapscan=on and jit_tuple_deforming=on are both needed to crash,
and jit_debugging_support=on does not yield a useful bt.

The table is not too special, but was probably ALTERed to add columns a good
number of times by one of our processes.  It has ~1100 columns, including
arrays, and some with null_frac=1.  I'm trying to come up with a test case
involving column types and order.

(gdb) bt
#0  0x00007f81a08b8b98 in ?? ()
#1  0x0000000000000000 in ?? ()

ts=# SET jit=on;SET jit_above_cost=0;explain(analyze off,verbose off) SELECT a.* FROM
child.daily_eric_umts_rnc_utrancell_view_201804a JOIN child.daily_eric_umts_rnc_utrancell_view_201804 b
USING(start_time,sect_id)WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time BETWEEN '2018-04-30'
AND'2018-05-04';
 
SET
SET
                                                                                 QUERY PLAN
                                                    
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=527.36..1038.17 rows=1 width=7760)
   Hash Cond: ((a.start_time = b.start_time) AND (a.sect_id = b.sect_id))
   ->  Bitmap Heap Scan on daily_eric_umts_rnc_utrancell_view_201804 a  (cost=9.78..515.59 rows=133 width=7760)
         Recheck Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <=
'2018-05-0400:00:00'::timestamp without time zone))
 
         ->  Bitmap Index Scan on daily_eric_umts_rnc_utrancell_view_201804_unique_idx  (cost=0.00..9.74 rows=133
width=0)
               Index Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <=
'2018-05-0400:00:00'::timestamp without time zone))
 
   ->  Hash  (cost=515.59..515.59 rows=133 width=12)
         ->  Bitmap Heap Scan on daily_eric_umts_rnc_utrancell_view_201804 b  (cost=9.78..515.59 rows=133 width=12)
               Recheck Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <=
'2018-05-0400:00:00'::timestamp without time zone))
 
               ->  Bitmap Index Scan on daily_eric_umts_rnc_utrancell_view_201804_unique_idx  (cost=0.00..9.74 rows=133
width=0)
                     Index Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <=
'2018-05-0400:00:00'::timestamp without time zone))
 
 JIT:
   Functions: 19
   Options: Inlining false, Optimization false, Expressions true, Deforming true

BTW find attached patch which I believe corrects some comments.

Justin

Вложения

Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> Are you referring to this  a future commit ?
this OR a future commit

> The table is not too special, but was probably ALTERed to add columns a good
> number of times by one of our processes.  It has ~1100 columns, including
> arrays, and some with null_frac=1.  I'm trying to come up with a test case
> involving column types and order.

I don't have a failing test case yet but here's what the columns look like:

ts=# SELECT attnum, null_frac, atttypid::regtype, attnotnull, attname, attalign , attstorage, attbyval, attlen,
attislocalFROM pg_stats s JOIN pg_attribute a USING(attname) JOIN pg_class c ON s.tablename=c.relname WHERE
c.oid=a.attrelidAND tablename='daily_eric_umts_rnc_utrancell_view_201804' AND (attnotnull OR null_frac>0.9 OR
atttypid::regtype::textLIKE '%[]' OR NOT atttypid::regtype::text~'int$|integer|double|numeric' OR attlen=-1 OR NOT
attbyvalOR atthasmissing OR attisdropped OR attnum BETWEEN 80 AND 99) ORDER BY 1;
 

 attnum | null_frac |          atttypid           | attnotnull |                     attname                     |
attalign| attstorage | attbyval | attlen | attislocal
 

--------+-----------+-----------------------------+------------+-------------------------------------------------+----------+------------+----------+--------+------------
      1 |         0 | timestamp without time zone | t          | start_time                                      | d
   | p          | t        |      8 | t
 
      2 |         0 | integer                     | t          | site_id                                         | i
   | p          | t        |      4 | t
 
      3 |         0 | integer                     | t          | sect_id                                         | i
   | p          | t        |      4 | t
 
      4 |         0 | integer                     | t          | rnc_id                                          | i
   | p          | t        |      4 | t
 
      5 |         0 | text                        | t          | utrancell                                       | i
   | x          | f        |     -1 | t
 
     30 |         1 | bigint                      | f          | dl_alt_chcode_alloc                             | d
   | p          | t        |      8 | t
 
     31 |         1 | integer                     | f          | dl_alt_chcode_alloc_min                         | i
   | p          | t        |      4 | t
 
     32 |         1 | integer                     | f          | dl_alt_chcode_alloc_max                         | i
   | p          | t        |      4 | t
 
     45 |         0 | integer[]                   | f          | dch_ul_rlc_user_tput_samples                    | i
   | x          | f        |     -1 | t
 
     46 |         0 | integer[]                   | f          | dch_ul_rlc_user_tput_samples_min                | i
   | x          | f        |     -1 | t
 
     47 |         0 | integer[]                   | f          | dch_ul_rlc_user_tput_samples_max                | i
   | x          | f        |     -1 | t
 
     51 |         0 | numeric                     | f          | ps_int_sum_latency_2                            | i
   | m          | f        |     -1 | t
 
     69 |         0 | numeric                     | f          | mbytes_ul_srb_only_eul                          | i
   | m          | f        |     -1 | t
 
[...]
     87 |         0 | numeric                     | f          | mbytes_dl_active_cs57                           | i
   | m          | f        |     -1 | t
 
     88 |         0 | numeric                     | f          | mbytes_dl_active_cs57_min                       | i
   | m          | f        |     -1 | t
 
     89 |         0 | numeric                     | f          | mbytes_dl_active_cs57_max                       | i
   | m          | f        |     -1 | t
 

If I query for cs57, it doesen't crash (in 500ms), but if I query for the next
column, cs57_min, it does (in 18000ms).

Here's a new error message instead of a crash this time:
ts=# SET jit=on;SET jit_above_cost=0;explain(analyze on,verbose off) SELECT b.mbytes_dl_active_cs57_min FROM
child.daily_eric_umts_rnc_utrancell_view_201804a JOIN child.daily_eric_umts_rnc_utrancell_view_201804 b
USING(start_time,sect_id)WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time BETWEEN '2018-04-30'
AND'2018-05-04';
 
SET
SET
ERROR:  out of memory            
DETAIL:  Failed on request of size 425170160 in memory context "HashBatchContext".

Here's verbose output you requested, sans expressions:

ts=# SET jit=on;SET jit_above_cost=0;explain(analyze off,verbose) SELECT b.mbytes_dl_active_cs57_min FROM
child.daily_eric_umts_rnc_utrancell_view_201804a JOIN child.daily_eric_umts_rnc_utrancell_view_201804 b
USING(start_time,sect_id)WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time BETWEEN '2018-04-30'
AND'2018-05-04';
 
SET
SET
                                                                                   QUERY PLAN
                                                       
 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2224.45..4491.19 rows=22 width=3)
   Output: b.mbytes_dl_active_cs57_min
   Hash Cond: ((a.start_time = b.start_time) AND (a.sect_id = b.sect_id))
   ->  Index Only Scan using daily_eric_umts_rnc_utrancell_view_201804_unique_idx on
child.daily_eric_umts_rnc_utrancell_view_201804a  (cost=0.29..2214.33 rows=656 width=12)
 
         Output: a.start_time, a.site_id, a.rnc_id, a.sect_id, a.utrancell
         Index Cond: ((a.start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (a.start_time <=
'2018-05-0400:00:00'::timestamp without time zone))
 
   ->  Hash  (cost=2214.33..2214.33 rows=656 width=15)
         Output: b.mbytes_dl_active_cs57_min, b.start_time, b.sect_id
         ->  Index Scan using daily_eric_umts_rnc_utrancell_view_201804_unique_idx on
child.daily_eric_umts_rnc_utrancell_view_201804b  (cost=0.29..2214.33 rows=656 width=15)
 
               Output: b.mbytes_dl_active_cs57_min, b.start_time, b.sect_id
               Index Cond: ((b.start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (b.start_time <=
'2018-05-0400:00:00'::timestamp without time zone))
 
 JIT:
   Functions: 19
   Options: Inlining false, Optimization false, Expressions true, Deforming true

If I query instead for a.mbytes_dl_active_cs57_min, I get no crash (yet).

Justin


Re: pg11.1 jit segv

От
Andres Freund
Дата:
Hi,

On 2018-11-16 08:38:26 -0600, Justin Pryzby wrote:
> On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> > I'm about to commit some changes to 12/master that'd possibly make it
> > easier to find issues like this.
> 
> Are you referring to this  a future commit ?
> commit 763f2edd92095b1ca2f4476da073a28505c13820
> Rejigger materializing and fetching a HeapTuple from a slot.

I was thinking of

commit 15d8f83128e15de97de61430d0b9569f5ebecc26
Author: Andres Freund <andres@anarazel.de>
Date:   2018-11-15 22:00:30 -0800

    Verify that expected slot types match returned slot types.

(and then the followup fix in a387a3df)

(will try to catch up later with the rest of what you wrote, after I had
some coffee).

- Andres


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> BTW find attached patch which I believe corrects some comments.

Updated.  Some of the changes may be gratuitous, but changed while I was
already looking.

Also note that I had to remove -flto=thin to compile under RH7.

Justin

Вложения

Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Fri, Nov 16, 2018 at 08:29:27AM -0800, Andres Freund wrote:
> > On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> > > I'm about to commit some changes to 12/master that'd possibly make it
> commit 15d8f83128e15de97de61430d0b9569f5ebecc26

I don't think it had to do with your commit, but I recompiled HEAD with debug +
casserts and have this to show.

TRAP: unrecognized TOAST vartag("1", File: "heaptuple.c", Line: 1705)

#0  0x00007faf5fac9277 in raise () from /lib64/libc.so.6
#1  0x00007faf5faca968 in abort () from /lib64/libc.so.6
#2  0x000000000088e347 in ExceptionalCondition (conditionName=conditionName@entry=0xa8a69e "1",
errorType=errorType@entry=0x8deda8"unrecognized TOAST vartag", fileName=fileName@entry=0x8dff82 "heaptuple.c", 
 
    lineNumber=lineNumber@entry=1705) at assert.c:54
#3  0x0000000000489830 in varsize_any (p=<optimized out>) at heaptuple.c:1705
#4  0x00007faf60c98560 in ?? ()
#5  0x00000000ffffffff in ?? ()
#6  0x000000000008c21c in ?? ()
#7  0x00000000004897d0 in ?? () at heaptuple.c:1690
#8  0x000000000008c21c in ?? ()
#9  0x000000000008c21c in ?? ()
#10 0x000000000000021c in ?? ()
#11 0x00000000004897d0 in ?? () at heaptuple.c:1690
[...]


Re: pg11.1 jit segv

От
Andres Freund
Дата:
Hi,

On 2018-11-16 19:23:44 -0600, Justin Pryzby wrote:
> On Fri, Nov 16, 2018 at 08:29:27AM -0800, Andres Freund wrote:
> > > On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> > > > I'm about to commit some changes to 12/master that'd possibly make it
> > commit 15d8f83128e15de97de61430d0b9569f5ebecc26
> 
> I don't think it had to do with your commit, but I recompiled HEAD with debug +
> casserts and have this to show.

That's probably just the same issue as before, namely random data
somehow being produced as the result of tuple deforming.

Greetings,

Andres Freund


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Fri, Nov 16, 2018 at 05:47:24PM -0800, Andres Freund wrote:
> That's probably just the same issue as before, namely random data
> somehow being produced as the result of tuple deforming.

Does this help at all?

ts=# SELECT utrancell FROM child.daily_eric_umts_rnc_utrancell_view_201807 LIMIT 9;
ERROR:  invalid memory alloc request size 18446744073709551613

Note that's a table named after a view, and not itself a view..

(gdb) b errfinish
Breakpoint 3 at 0x891a00: file elog.c, line 411.
(gdb) c
Continuing.
SELECT start_time, site_id, sect_id, rnc_id, utrancell, eul_harq_ps_trans_tti2_1 FROM
child.daily_eric_umts_rnc_utrancell_view_201801LIMIT 1;
 
         1: start_time  (typeid = 1114, len = 8, typmod = -1, byval = t)
         2: site_id     (typeid = 23, len = 4, typmod = -1, byval = t)
         3: sect_id     (typeid = 23, len = 4, typmod = -1, byval = t)
         4: rnc_id      (typeid = 23, len = 4, typmod = -1, byval = t)
         5: utrancell   (typeid = 25, len = -1, typmod = -1, byval = f)
         6: eul_harq_ps_trans_tti2_1    (typeid = 20, len = 8, typmod = -1, byval = t)
        ----
         1: start_time = "2000-01-01 00:00:00"  (typeid = 1114, len = 8, typmod = -1, byval = t)
         2: site_id = "0"       (typeid = 23, len = 4, typmod = -1, byval = t)
         3: sect_id = "0"       (typeid = 23, len = 4, typmod = -1, byval = t)
         4: rnc_id = "0"        (typeid = 23, len = 4, typmod = -1, byval = t)

The error occurs when selecting the 5th column, which is "text not null"; but,
the other fields are also garbage (namely 0).

#0  slot_compile_deform (context=context@entry=0xe184b8, desc=desc@entry=0xe9baf0, ops=ops@entry=0xa21020
<TTSOpsBufferHeapTuple>,natts=5) at llvmjit_deform.c:36
 
#1  0x00007fffe72535f5 in llvm_compile_expr (state=0xed8458) at llvmjit_expr.c:325
#2  0x000000000061e429 in ExecReadyExpr (state=state@entry=0xed8458) at execExpr.c:628
#3  0x00000000006219b4 in ExecBuildProjectionInfo (targetList=<optimized out>, econtext=<optimized out>,
slot=<optimizedout>, parent=parent@entry=0xed7d40, inputDesc=inputDesc@entry=0xe9baf0) at execExpr.c:472
 
#4  0x00000000006370b2 in ExecAssignProjectionInfo (planstate=planstate@entry=0xed7d40,
inputDesc=inputDesc@entry=0xe9baf0)at execUtils.c:501
 
#5  0x00000000006371c6 in ExecConditionalAssignProjectionInfo (planstate=planstate@entry=0xed7d40, inputDesc=0xe9baf0,
varno=<optimizedout>) at execUtils.c:539
 
#6  0x0000000000632627 in ExecAssignScanProjectionInfo (node=node@entry=0xed7d40) at execScan.c:240
#7  0x0000000000655867 in ExecInitSeqScan (node=node@entry=0xeecbb8, estate=estate@entry=0xed78d8,
eflags=eflags@entry=16)at nodeSeqscan.c:182
 
#8  0x0000000000630e53 in ExecInitNode (node=0xeecbb8, estate=estate@entry=0xed78d8, eflags=eflags@entry=16) at
execProcnode.c:207
#9  0x000000000064c78a in ExecInitLimit (node=node@entry=0xeecc50, estate=estate@entry=0xed78d8,
eflags=eflags@entry=16)at nodeLimit.c:368
 
#10 0x0000000000630b52 in ExecInitNode (node=node@entry=0xeecc50, estate=estate@entry=0xed78d8, eflags=eflags@entry=16)
atexecProcnode.c:363
 
#11 0x000000000062ac45 in InitPlan (eflags=16, queryDesc=<optimized out>) at execMain.c:1025
#12 standard_ExecutorStart (queryDesc=<optimized out>, eflags=16) at execMain.c:265
#13 0x0000000000784fdb in PortalStart (portal=portal@entry=0xe48528, params=params@entry=0x0, eflags=eflags@entry=0,
snapshot=snapshot@entry=0x0)at pquery.c:520
 
#14 0x0000000000781218 in exec_simple_query (query_string=0xe56568 "SELECT start_time,site_id,sect_id,rnc_id,utrancell
FROMchild.daily_eric_umts_rnc_utrancell_view_201801 LIMIT 1\n") at postgres.c:1176
 
#15 0x00000000007825c2 in PostgresMain (argc=argc@entry=17, argv=argv@entry=0xddbb40, dbname=0xdfcd70 "postgres",
dbname@entry=0x0,username=<optimized out>) at postgres.c:4243
 
#16 0x000000000047d8be in main (argc=17, argv=0xddbb40) at main.c:224

#0  errfinish (dummy=dummy@entry=0) at elog.c:411
#1  0x0000000000895379 in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0xaabf68 "invalid memory alloc request
size%zu") at elog.c:1365
 
#2  0x00000000008bb46e in palloc (size=18446744073709551613) at mcxt.c:934
#3  0x000000000086ad3a in text_to_cstring (t=0x7fffe7b83344) at varlena.c:189
#4  0x0000000000897ed5 in FunctionCall1Coll (flinfo=flinfo@entry=0x7fffffffdce0, collation=collation@entry=0,
arg1=arg1@entry=140737080996676)at fmgr.c:1123
 
#5  0x0000000000899206 in OutputFunctionCall (val=140737080996676, flinfo=0x7fffffffdce0) at fmgr.c:1755
#6  OidOutputFunctionCall (functionId=<optimized out>, val=val@entry=140737080996676) at fmgr.c:1838
#7  0x000000000048b57b in debugtup (slot=0xed8330, self=<optimized out>) at printtup.c:625
#8  0x0000000000629a50 in ExecutePlan (execute_once=<optimized out>, dest=0xa66160 <debugtupDR>, direction=<optimized
out>,numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, 
 
    planstate=0xed7ad0, estate=0xed7878) at execMain.c:1748
#9  standard_ExecutorRun (queryDesc=0xe6ac78, direction=<optimized out>, count=0, execute_once=<optimized out>) at
execMain.c:364
#10 0x000000000078409b in PortalRunSelect (portal=portal@entry=0xe484c8, forward=forward@entry=true, count=0,
count@entry=9223372036854775807,dest=dest@entry=0xa66160 <debugtupDR>) at pquery.c:932
 
#11 0x00000000007855b8 in PortalRun (portal=portal@entry=0xe484c8, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true,run_once=run_once@entry=true, dest=dest@entry=0xa66160 <debugtupDR>, 
 
    altdest=altdest@entry=0xa66160 <debugtupDR>, completionTag=completionTag@entry=0x7fffffffdfc0 "") at pquery.c:773
#12 0x000000000078128d in exec_simple_query (query_string=0xe56508 "SELECT
start_time,site_id,sect_id,rnc_id,utrancell,eul_harq_ps_trans_tti2_1FROM
child.daily_eric_umts_rnc_utrancell_view_201801LIMIT 1;\n")
 
    at postgres.c:1215
#13 0x00000000007825c2 in PostgresMain (argc=argc@entry=15, argv=argv@entry=0xddbb40, dbname=0xdfcd30 "postgres",
dbname@entry=0x0,username=<optimized out>) at postgres.c:4243
 
#14 0x000000000047d8be in main (argc=15, argv=0xddbb40) at main.c:224

Justin


Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Fri, Nov 16, 2018 at 10:24:46AM -0600, Justin Pryzby wrote:
> On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> > The table is not too special, but was probably ALTERed to add columns a good
> > number of times by one of our processes.  It has ~1100 columns, including
> > arrays, and some with null_frac=1.  I'm trying to come up with a test case
> > involving column types and order.

Try this ?

SELECT 'DROP TABLE t; CREATE TABLE t (a3 text, a1 int[], '||array_to_string(array_agg('c'||i||' bigint default
0'),',')||');INSERT INTO t VALUES(0)' FROM generate_series(1,999) i;
 
\gexec
SET jit=on; SET jit_above_cost=0; SELECT a3 FROM t LIMIT 9;

That's given all sorts of nice errors:

ERROR:  invalid memory alloc request size 18446744073709551613
ERROR:  compressed data is corrupted

And occasionally crashes and/or returns unrelated data:

 = '0', $21 = '0', $22 = '0', $23 = '0', $24 = '0', $25 = '2741'\x03
 n 21782 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location 

Justin


Re: pg11.1 jit segv

От
Andres Freund
Дата:
On 2018-11-17 17:37:15 -0600, Justin Pryzby wrote:
> On Fri, Nov 16, 2018 at 10:24:46AM -0600, Justin Pryzby wrote:
> > On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> > > The table is not too special, but was probably ALTERed to add columns a good
> > > number of times by one of our processes.  It has ~1100 columns, including
> > > arrays, and some with null_frac=1.  I'm trying to come up with a test case
> > > involving column types and order.
> 
> Try this ?
> 
> SELECT 'DROP TABLE t; CREATE TABLE t (a3 text, a1 int[], '||array_to_string(array_agg('c'||i||' bigint default
0'),',')||');INSERT INTO t VALUES(0)' FROM generate_series(1,999) i;
 
> \gexec
> SET jit=on; SET jit_above_cost=0; SELECT a3 FROM t LIMIT 9;
> 
> That's given all sorts of nice errors:
> 
> ERROR:  invalid memory alloc request size 18446744073709551613
> ERROR:  compressed data is corrupted
> 
> And occasionally crashes and/or returns unrelated data:
> 
>  = '0', $21 = '0', $22 = '0', $23 = '0', $24 = '0', $25 = '2741'\x03
>  n 21782 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location 

Ah, hah. The issue is that t_hoff is larger than 128 here (due to the
size of the NULL bitmap), and apparently getelementptr interprets an
i8 > 128 as a signed integer. Which thus yields a negative offset from
the start of the tuple, which predictably doesn't work great.

    v_hoff =
        l_load_struct_gep(b, v_tuplep,
                          FIELDNO_HEAPTUPLEHEADERDATA_HOFF,
                          "t_hoff");
    v_tupdata_base =
        LLVMBuildGEP(b,
                     LLVMBuildBitCast(b,
                                      v_tuplep,
                                      l_ptr(LLVMInt8Type()),
                                      ""),
                     &v_hoff, 1,
                     "v_tupdata_base");

I'd missed the "These integers are treated as signed values where
relevant." bit in the getelementptr docs
http://llvm.org/docs/LangRef.html#getelementptr-instruction

The fix is easy enough, just adding a
    v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), "");
fixes the issue for me.

Could you check that the attached patch this also fixes your original
issue? Going through the code to see if there's other occurances of
this.

Greetings,

Andres Freund

Вложения

Re: pg11.1 jit segv

От
Justin Pryzby
Дата:
On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> Could you check that the attached patch this also fixes your original
> issue? Going through the code to see if there's other occurances of
> this.

Confirmed that fixes my crash.

Thanks,
Justin


Re: pg11.1 jit segv

От
Andres Freund
Дата:
Hi,

On 2018-11-26 22:56:09 -0600, Justin Pryzby wrote:
> On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> > Could you check that the attached patch this also fixes your original
> > issue? Going through the code to see if there's other occurances of
> > this.
> 
> Confirmed that fixes my crash.

Thanks a lot for narrowing down your crash to something I can reproduce!


Here's a more complete patch, with a testcase.

Tom, the test creates a 1100k column table (using \set ECHO none +
gexec), but with a small row. Currently it's not dropped after the
table, as I thought it might be worthwhile to be tested by
pg_dump/upgrade etc too. You're probably the person most concerned with
test runtimes, ... Any concerns about that? The table creation is
quick*, on the order of 30ms.

Greetings,

Andres Freund


*at least as long as there's no default columns and the table's not
dropped, seems we have somewhat of an O(N^2) situation going on when
dropping a table with many columns that have default columns - we
re-build the cache entry after each dropped default value. But as the
max is 1600 columns, that's not too bad.

Вложения

Re: pg11.1 jit segv

От
Andres Freund
Дата:
On 2018-11-27 00:26:55 -0800, Andres Freund wrote:
> Hi,
> 
> On 2018-11-26 22:56:09 -0600, Justin Pryzby wrote:
> > On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> > > Could you check that the attached patch this also fixes your original
> > > issue? Going through the code to see if there's other occurances of
> > > this.
> > 
> > Confirmed that fixes my crash.
> 
> Thanks a lot for narrowing down your crash to something I can reproduce!
> 
> 
> Here's a more complete patch, with a testcase.
> 
> Tom, the test creates a 1100k column table (using \set ECHO none +
> gexec), but with a small row. Currently it's not dropped after the
> table, as I thought it might be worthwhile to be tested by
> pg_dump/upgrade etc too. You're probably the person most concerned with
> test runtimes, ... Any concerns about that? The table creation is
> quick*, on the order of 30ms.

And pushed. Justin, thanks again for reporting the bug and then
narrowing it down to a reproducible test case! Would've been much harder
to diagnose without that.

I'll look into your comments patch in a bit.

Greetings,

Andres Freund


jit comments typos (Re: pg11.1 jit segv)

От
Justin Pryzby
Дата:
On Tue, Nov 27, 2018 at 10:24:52AM -0800, Andres Freund wrote:
> And pushed. Justin, thanks again for reporting the bug and then
> narrowing it down to a reproducible test case! Would've been much harder
> to diagnose without that.
> 
> I'll look into your comments patch in a bit.

Thanks for implementing and patching it :)

And thanks for remembering the patch, and reminding me.

Here's an updated copy with additional hunks.

Justin

Вложения