Обсуждение: Assertion failure in REL9_5_STABLE

Поиск
Список
Период
Сортировка

Assertion failure in REL9_5_STABLE

От
Marko Tiikkaja
Дата:
Hi,

Running one specific test from our application against REL9_5_STABLE
(current as of today) gives me this:

#2  0x00007effb59595be in ExceptionalCondition (
     conditionName=conditionName@entry=0x7effb5b27a88
"!(CritSectionCount > 0 || TransactionIdIsCurrentTransactionId((
(!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) &&
!((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : (
(tup)-"..., errorType=errorType@entry=0x7effb599b74b "FailedAssertion",
fileName=fileName@entry=0x7effb5b2796c "combocid.c",
lineNumber=lineNumber@entry=132) at assert.c:54
#3  0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at
combocid.c:131
#4  0x00007effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90,
tuple=tuple@entry=0x7fffcee73690, cid=346,
mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock,
follow_updates=follow_updates@entry=1 '\001',
     buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680)
at heapam.c:4813
#5  0x00007effb5753e82 in ExecLockRows (node=node@entry=0x7effb6cebb00)
at nodeLockRows.c:179
#6  0x00007effb573cbc8 in ExecProcNode (node=node@entry=0x7effb6cebb00)
at execProcnode.c:516
#7  0x00007effb5739432 in ExecutePlan (dest=0x7effb5dd8160
<spi_printtupDR>, direction=<optimized out>, numberTuples=0,
sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x7effb6cebb00,
estate=0x7effb6ceb8f8)
     at execMain.c:1549
#8  standard_ExecutorRun (queryDesc=0x7effb6ae3b40, direction=<optimized
out>, count=0) at execMain.c:337
#9  0x00007effb57661db in _SPI_pquery (tcount=0, fire_triggers=1 '\001',
queryDesc=0x7effb6ae3b40) at spi.c:2411

The failure is a number of levels down a call stack of PL/PgSQL
functions, but I can reproduce it at will by calling the function.  I
unfortunately can't narrow it down to a smaller test case, but attached
is an xlogdump of the session.  The query that finally breaks the
elephant's back is a SELECT .. FOR UPDATE from relid=54511.

Any ideas on how to debug this?


.m

Вложения

Re: Assertion failure in REL9_5_STABLE

От
Tom Lane
Дата:
Marko Tiikkaja <marko@joh.to> writes:
> The failure is a number of levels down a call stack of PL/PgSQL 
> functions, but I can reproduce it at will by calling the function.  I 
> unfortunately can't narrow it down to a smaller test case, but attached 
> is an xlogdump of the session.  The query that finally breaks the 
> elephant's back is a SELECT .. FOR UPDATE from relid=54511.

> Any ideas on how to debug this?

If this test case used to pass, it'd be pretty useful to git bisect
to find where it started to fail.
        regards, tom lane



Re: Assertion failure in REL9_5_STABLE

От
Alvaro Herrera
Дата:
Marko Tiikkaja wrote:
> Hi,
> 
> Running one specific test from our application against REL9_5_STABLE
> (current as of today) gives me this:
> 
> #2  0x00007effb59595be in ExceptionalCondition (
>     conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0
> || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) &&
> ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ?
> HeapTupleGetUpdateXid(tup) : ( (tup)-"...,
> errorType=errorType@entry=0x7effb599b74b "FailedAssertion",
> fileName=fileName@entry=0x7effb5b2796c "combocid.c",
> lineNumber=lineNumber@entry=132) at assert.c:54
> #3  0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at
> combocid.c:131
> #4  0x00007effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90,
> tuple=tuple@entry=0x7fffcee73690, cid=346,
> mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock,
> follow_updates=follow_updates@entry=1 '\001',
>     buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680) at
> heapam.c:4813

Umm.  AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after
already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught
the same assertion).  Something is odd there ...

Can you share the test case?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in REL9_5_STABLE

От
Marko Tiikkaja
Дата:
On 2016-08-10 19:28, Alvaro Herrera wrote:
> Umm.  AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after
> already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught
> the same assertion).  Something is odd there ...

HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case. 
HeapTupleSelfUpdated comes from here (line 4749):
    /* if there are updates, follow the update chain */    if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{        HTSU_Result res;        res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
     GetCurrentTransactionId(),                                      mode);        if (res != HeapTupleMayBeUpdated)
   {            result = res;            /* recovery code expects to have buffer lock held */
LockBuffer(*buffer,BUFFER_LOCK_EXCLUSIVE);            goto failed;         }     }
 

> Can you share the test case?

Not at this time, sorry.


.m



Re: Assertion failure in REL9_5_STABLE

От
Alvaro Herrera
Дата:
Marko Tiikkaja wrote:
> On 2016-08-10 19:28, Alvaro Herrera wrote:
> >Umm.  AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after
> >already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught
> >the same assertion).  Something is odd there ...
> 
> HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case.
> HeapTupleSelfUpdated comes from here (line 4749):
> 
>     /* if there are updates, follow the update chain */
>     if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
>     {
>         HTSU_Result res;
>         res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
>                                       GetCurrentTransactionId(),
>                                       mode);
>         if (res != HeapTupleMayBeUpdated)
>         {
>             result = res;
>             /* recovery code expects to have buffer lock held */
>             LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
>             goto failed;
>          }
>      }

Oh, I see ... so there's an update chain, and you're updating a earlier
tuple.  But the later tuple has a multixact and one of the members is
the current transaction.

I wonder how can you lock a tuple that's not the latest, where that
update chain was produced by your own transaction.  I suppose this is
because of plpgsql use of cursors.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in REL9_5_STABLE

От
Alvaro Herrera
Дата:
BTW this is not a regression, right?  It's been broken all along.  Or am
I mistaken?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in REL9_5_STABLE

От
Marko Tiikkaja
Дата:
On 2016-08-10 11:01 PM, Alvaro Herrera wrote:
> Oh, I see ... so there's an update chain, and you're updating a earlier
> tuple.  But the later tuple has a multixact and one of the members is
> the current transaction.
>
> I wonder how can you lock a tuple that's not the latest, where that
> update chain was produced by your own transaction.  I suppose this is
> because of plpgsql use of cursors.

There's a rolled back subtransaction that also did some magic on the 
rows AFAICT.  I can try and spend some time producing a smaller test 
case over the weekend.  No hurry since this missed the today's point 
release anyway.


.m



Re: Assertion failure in REL9_5_STABLE

От
Marko Tiikkaja
Дата:
On 2016-08-11 12:09 AM, Alvaro Herrera wrote:
> BTW this is not a regression, right?  It's been broken all along.  Or am
> I mistaken?

You're probably right.  I just hadn't realized I could run our app 
against 9.5 with --enable-cassert until last week.


.m



Re: Assertion failure in REL9_5_STABLE

От
Michael Paquier
Дата:
On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On 2016-08-11 12:09 AM, Alvaro Herrera wrote:
>>
>> BTW this is not a regression, right?  It's been broken all along.  Or am
>> I mistaken?
>
>
> You're probably right.  I just hadn't realized I could run our app against
> 9.5 with --enable-cassert until last week.

Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still
see a problem?
-- 
Michael



Re: Assertion failure in REL9_5_STABLE

От
Marko Tiikkaja
Дата:
On 11/08/16 8:48 AM, Michael Paquier wrote:
> On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
>> On 2016-08-11 12:09 AM, Alvaro Herrera wrote:
>>>
>>> BTW this is not a regression, right?  It's been broken all along.  Or am
>>> I mistaken?
>>
>> You're probably right.  I just hadn't realized I could run our app against
>> 9.5 with --enable-cassert until last week.
>
> Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still
> see a problem?

Yeah, no effect.


.m



Re: Assertion failure in REL9_5_STABLE

От
Andres Freund
Дата:
On 2016-08-11 11:01:18 +0200, Marko Tiikkaja wrote:
> On 11/08/16 8:48 AM, Michael Paquier wrote:
> > On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
> > > On 2016-08-11 12:09 AM, Alvaro Herrera wrote:
> > > > 
> > > > BTW this is not a regression, right?  It's been broken all along.  Or am
> > > > I mistaken?
> > > 
> > > You're probably right.  I just hadn't realized I could run our app against
> > > 9.5 with --enable-cassert until last week.
> > 
> > Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still
> > see a problem?
> 
> Yeah, no effect.

Phew. Alvaro, are you tackling this?



Re: Assertion failure in REL9_5_STABLE

От
Alvaro Herrera
Дата:
Andres Freund wrote:

> Phew. Alvaro, are you tackling this?

Sure.


Marko Tiikkaja wrote:

> There's a rolled back subtransaction that also did some magic on the rows
> AFAICT.  I can try and spend some time producing a smaller test case over
> the weekend.  No hurry since this missed the today's point release anyway.

Marko, any luck?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in REL9_5_STABLE

От
Alvaro Herrera
Дата:
Marko Tiikkaja wrote:

> Running one specific test from our application against REL9_5_STABLE
> (current as of today) gives me this:
> 
> #2  0x00007effb59595be in ExceptionalCondition (
>     conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0
> || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) &&
> ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ?
> HeapTupleGetUpdateXid(tup) : ( (tup)-"...,
> errorType=errorType@entry=0x7effb599b74b "FailedAssertion",
> fileName=fileName@entry=0x7effb5b2796c "combocid.c",
> lineNumber=lineNumber@entry=132) at assert.c:54
> #3  0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at
> combocid.c:131

Can you please do frame 3 print tup->t_ctid
in gdb?  That would print the tuple address with which to grep the xlog
sequence.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in REL9_5_STABLE

От
Alvaro Herrera
Дата:
Marko Tiikkaja wrote:
> Hi,
> 
> Running one specific test from our application against REL9_5_STABLE
> (current as of today) gives me this:

Just pushed the fix.  Thanks for the report!

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services