Обсуждение: Out of memory error causes Abort, Abort tries to allocate memory

Поиск
Список
Период
Сортировка

Out of memory error causes Abort, Abort tries to allocate memory

От
Jeff Davis
Дата:
I found the root cause of the bug I reported at:

http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php

What happens is this:
* Out of memory condition causes an ERROR
* ERROR triggers an AbortTransaction()
* AbortTransaction() calls RecordTransactionAbort()
* RecordTransactionAbort calls smgrGetPendingDeletes()
* smgrGetPendingDeletes() calls palloc()
* palloc() fails, resulting in ERROR, causing infinite recursion
* elog.c detects infinite recursion, and elevates it to PANIC

I'm not sure how easy this is to fix, but I asked on IRC and got some
agreement that this is a bug.

It seems to me, in order to fix it, we would have to avoid allocating
memory on the AbortTransaction path. All smgrGetPendingDeletes() needs
to allocate is a few dozen bytes (depending on the number of relations
to be deleted). Perhaps it could allocate those bytes as list of pending
deletes fills up. Or maybe we can somehow avoid needing to record the
relnodes to be deleted in order for the abort to succeed.

I'm still not sure why foreign keys on large insert statements don't eat
memory on 7.4, but do on 8.0+.

Regards,
    Jeff Davis

Re: Out of memory error causes Abort, Abort tries to allocate memory

От
Alvaro Herrera
Дата:
Jeff Davis wrote:
> I found the root cause of the bug I reported at:
>
> http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php
>
> What happens is this:
> * Out of memory condition causes an ERROR
> * ERROR triggers an AbortTransaction()
> * AbortTransaction() calls RecordTransactionAbort()
> * RecordTransactionAbort calls smgrGetPendingDeletes()
> * smgrGetPendingDeletes() calls palloc()
> * palloc() fails, resulting in ERROR, causing infinite recursion
> * elog.c detects infinite recursion, and elevates it to PANIC
>
> I'm not sure how easy this is to fix, but I asked on IRC and got some
> agreement that this is a bug.

Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
has some preallocated space, before calling RecordTransactionAbort (or
maybe have RecordTransactionAbort itself do it).

Problem is, what happens if ErrorContext is filled up by doing this?  At
that point we will be severely fscked up, and you probably won't get the
PANIC either.  (Maybe it doesn't happen in this particular case, but
seems a real risk.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Out of memory error causes Abort, Abort tries to

От
Jeff Davis
Дата:
On Wed, 2006-10-25 at 16:20 -0300, Alvaro Herrera wrote:
> Jeff Davis wrote:
> > I found the root cause of the bug I reported at:
> >
> > http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php
> >
> > What happens is this:
> > * Out of memory condition causes an ERROR
> > * ERROR triggers an AbortTransaction()
> > * AbortTransaction() calls RecordTransactionAbort()
> > * RecordTransactionAbort calls smgrGetPendingDeletes()
> > * smgrGetPendingDeletes() calls palloc()
> > * palloc() fails, resulting in ERROR, causing infinite recursion
> > * elog.c detects infinite recursion, and elevates it to PANIC
> >
> > I'm not sure how easy this is to fix, but I asked on IRC and got some
> > agreement that this is a bug.
>
> Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
> has some preallocated space, before calling RecordTransactionAbort (or
> maybe have RecordTransactionAbort itself do it).
>
> Problem is, what happens if ErrorContext is filled up by doing this?  At
> that point we will be severely fscked up, and you probably won't get the
> PANIC either.  (Maybe it doesn't happen in this particular case, but
> seems a real risk.)
>

If we have a way to allocate memory and recover if it fails, perhaps
RecordTransactionAbort() could set the "rels to delete" part of the log
record to some special value that means "There might be relations to
delete, but I don't know which ones". Then, if necessary, it could
determine the relations that should be deleted at recovery time.

This idea assumes that we can figure out which relations are abandoned,
and also assumes that smgrGetPendingDeletes() is the only routine that
allocates memory on the path to abort a transaction due to an out of
memory error.

Regards,
    Jeff Davis

Re: Out of memory error causes Abort, Abort tries to allocate memory

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Jeff Davis wrote:
>> * smgrGetPendingDeletes() calls palloc()
>> * palloc() fails, resulting in ERROR, causing infinite recursion

> Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
> has some preallocated space, before calling RecordTransactionAbort (or
> maybe have RecordTransactionAbort itself do it).

Seems like it'd be smarter to try to free some memory before we push
forward with transaction abort.  ErrorContext has only a limited amount
of space ...

            regards, tom lane

Re: Out of memory error causes Abort, Abort tries to

От
Jeff Davis
Дата:
On Wed, 2006-10-25 at 18:15 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Jeff Davis wrote:
> >> * smgrGetPendingDeletes() calls palloc()
> >> * palloc() fails, resulting in ERROR, causing infinite recursion
>
> > Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
> > has some preallocated space, before calling RecordTransactionAbort (or
> > maybe have RecordTransactionAbort itself do it).
>
> Seems like it'd be smarter to try to free some memory before we push
> forward with transaction abort.  ErrorContext has only a limited amount
> of space ...
>

In the particular case I'm referring to, it's the referential integrity
constraints using all the memory. Is that memory allocated in a
convenient context to free before the abort?

Glancing at the code, I think that it would work to MemoryContextReset()
the query's memory context, because the pending deletes (of the
relnodes) are allocated in TopMemoryContext. After the query's memory
context is reset, there should be plenty of space to finish the abort
within that context.

Is there any data in the query's memory context that needs to be saved
after we know we're aborting?

Regards,
    Jeff Davis