I was going to add a link to this thread in a commit message for our internal
project's repo, but found that (even 2.5 hours later) the HTTP archives don't
show my last message; I'd expect it to be here:
https://www.postgresql.org/message-id/flat/20191012004446.GT10470%40telsasoft.com
Quoted message follows, sent with headers like so:
From pryzby@telsasoft.com Sun Oct 13 08:03:21 2019
Date: Sun, 13 Oct 2019 08:03:21 -0500
From: Justin Pryzby <pryzby@telsasoft.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: pgsql-hackers@lists.postgresql.org,
Peter Eisentraut <peter.eisentraut@2ndquadrant.com>,
Andreas Karlsson <andreas@proxel.se>,
Alvaro Herrera <alvherre@alvh.no-ip.org>
Subject: Re: v12.0: segfault in reindex CONCURRENTLY
Message-ID: <20191013130321.GD4475@telsasoft.com>
References: <20191012004446.GT10470@telsasoft.com>
<20191013090643.GA1434@paquier.xyz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20191013090643.GA1434@paquier.xyz>
On Sun, Oct 13, 2019 at 08:03:21AM -0500, Justin Pryzby wrote:
> On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote:
> > On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote:
> > > Unfortunately, there was no core file, and I'm still trying to reproduce it.
> >
> > Forgot to set ulimit -c? Having a backtrace would surely help.
>
> Fortunately (?) another server hit crashed last night.
> (Doesn't appear to be relevant, but this table has no inheritence/partition-ness).
>
> Looks like it's a race condition and dereferencing *holder=NULL. The first
> crash was probably the same bug, due to report query running during "reindex
> CONCURRENTLY", and probably finished at nearly the same time as another locker.
>
> Relevant code introduced here:
>
> commit ab0dfc961b6a821f23d9c40c723d11380ce195a6
> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Tue Apr 2 15:18:08 2019 -0300
>
> Report progress of CREATE INDEX operations
>
> Needs to be conditionalized (as anticipated by the comment)
>
> + if (holder)
> pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
> holder->pid);
>
>
> Core was generated by `postgres: postgres ts [local] REINDEX '.
> Program terminated with signal 11, Segmentation fault.
>
> #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5,
progress=progress@entry=true)at lmgr.c:911
> #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0)
atindexcmds.c:3090
> #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at
indexcmds.c:2352
> #3 0x00000000007657fe in standard_ProcessUtility (pstmt=pstmt@entry=0x1d05468,
queryString=queryString@entry=0x1d046e0"REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename",
> context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
dest=dest@entry=0x1d05548,
> completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at utility.c:787
> #4 0x00007f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY
loaded_cdr_files_filename",context=PROCESS_UTILITY_TOPLEVEL,
> params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pg_stat_statements.c:1006
> #5 0x0000000000762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, isTopLevel=<optimized out>,
setHoldSnapshot=<optimizedout>, dest=0x1d05548,
> completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175
> #6 0x0000000000763267 in PortalRunMulti (portal=portal@entry=0x1d7a4e0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x1d05548,
> altdest=altdest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at pquery.c:1328
> #7 0x0000000000763e45 in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimizedout>, dest=0x1d05548, altdest=0x1d05548,
> completionTag=0x7ffc05e6c0a0 "") at pquery.c:796
> #8 0x000000000075ff45 in exec_simple_query (query_string=<optimized out>) at postgres.c:1215
> #9 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>,
username=<optimizedout>) at postgres.c:4236
> #10 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431
> #11 BackendStartup (port=0x1d2b340) at postmaster.c:4122
> #12 ServerLoop () at postmaster.c:1704
> #13 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cff280) at postmaster.c:1377
> #14 0x0000000000484c93 in main (argc=3, argv=0x1cff280) at main.c:228
>
> bt f
>
> #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5,
progress=progress@entry=true)at lmgr.c:911
> holder = 0x0
> lockholders = 0x1d9b778
> holders = <optimized out>
> lc = 0x1d9bf80
> total = <optimized out>
> done = 1
> #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0)
atindexcmds.c:3090
> heapRelationIds = 0x1d30360
> indexIds = 0x1d303b0
> newIndexIds = <optimized out>
> relationLocks = <optimized out>
> lockTags = <optimized out>
> lc = 0x0
> lc2 = 0x0
> private_context = <optimized out>
> oldcontext = <optimized out>
> relkind = 105 'i'
> relationName = 0x0
> relationNamespace = 0x0
> ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = {tv_sec = 281483566645394, tv_usec =
75668733820930},ru_stime = {tv_sec = 0, tv_usec = 30592272}, {
> ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 105,
__ru_idrss_word= 105}, {ru_isrss = -926385342574214656,
> __ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, __ru_minflt_word = 8924839}, {ru_majflt =
0,__ru_majflt_word = 0}, {ru_nswap = 17618,
> __ru_nswap_word = 17618}, {ru_inblock = 139781327898864, __ru_inblock_word = 139781327898864},
{ru_oublock= 30430312, __ru_oublock_word = 30430312}, {
> ru_msgsnd = 139781327898864, __ru_msgsnd_word = 139781327898864}, {ru_msgrcv = 139781327898864,
__ru_msgrcv_word= 139781327898864}, {ru_nsignals = 139781327898864,
> __ru_nsignals_word = 139781327898864}, {ru_nvcsw = 139781327898864, __ru_nvcsw_word = 139781327898864},
{ru_nivcsw= 139781327898864,
> __ru_nivcsw_word = 139781327898864}}}
> __func__ = "ReindexRelationConcurrently"
> #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at
indexcmds.c:2352
> state = {concurrent = true, locked_table_oid = 17608}
> indOid = 17618
> irel = <optimized out>
> persistence = 112 'p'
> ...
--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581