Обсуждение: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault

Поиск
Список
Период
Сортировка

BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18374
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.2
Operating system:   Ubuntu 22.04
Description:

When a backend with deeply nested memory contexts hits out-of-memory
condition and logs the contexts, it might lead to a segmentation fault
(due to the lack of free memory again). For example:
$ ulimit -Sv 300000; TESTS=infinite_recurse make -s check-tests
(on 64-bit Ubuntu 22.04)

fails:
# +++ regress check in src/test/regress +++
# using temp instance on port 61698 with PID 809399
not ok 1     - infinite_recurse                          286 ms
# (test process exited with exit code 2)

with the following stack trace:
Core was generated by `postgres: law regression [local] SELECT
                        '.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/809680' in core file too small.
#0  0x00005643d12ecd50 in dostr (str=str@entry=0x5643d145b137 "  ", slen=2,
target=target@entry=0x7ffdafcb6290) at snprintf.c:1378
1378    {
(gdb) bt
#0  0x00005643d12ecd50 in dostr (str=str@entry=0x5643d145b137 "  ", slen=2,
target=target@entry=0x7ffdafcb6290) at snprintf.c:1378
#1  0x00005643d12ed54a in dopr (...) at snprintf.c:417
#2  0x00005643d12edec2 in pg_vfprintf (...) at snprintf.c:257
#3  0x00005643d12edfa7 in pg_fprintf (...) at snprintf.c:270
#4  0x00005643d12c086e in MemoryContextStatsPrint (...,
stats_string=stats_string@entry=0x7ffdafcb68c0 "8192 total in 1 blocks; 5072
free (0 chunks); 3120 used", 
    print_to_stderr=print_to_stderr@entry=true) at mcxt.c:909
#5  0x00005643d12b9604 in AllocSetStats (...) at aset.c:1508
#6  0x00005643d12c0553 in MemoryContextStatsInternal (...) at mcxt.c:770
...
#675 0x00005643d12c05ee in MemoryContextStatsInternal (...) at mcxt.c:786
#676 0x00005643d12c1030 in MemoryContextStatsDetail (...) at mcxt.c:721
#677 0x00005643d12c1111 in MemoryContextStats (...) at mcxt.c:702
#678 0x00005643d12c19d9 in palloc (size=size@entry=16384) at mcxt.c:1243
#679 0x00005643d12d2ed8 in tuplestore_begin_common (eflags=4,
interXact=interXact@entry=false, maxKBytes=4096) at tuplestore.c:281
#680 0x00005643d12d37c9 in tuplestore_begin_heap (...) at tuplestore.c:331
#681 0x00005643d0f72099 in fmgr_sql (...) at functions.c:1142
...
#20980 0x00005643d0f5fd44 in ExecProcNode (...) at
../../../src/include/executor/executor.h:273
#20981 ExecutePlan (...) at execMain.c:1670
#20982 0x00005643d0f5ff07 in standard_ExecutorRun (...) at execMain.c:365
#20983 0x00005643d0f5ffe1 in ExecutorRun (...) at execMain.c:309
#20984 0x00005643d0f701e9 in postquel_getnext (...) at functions.c:895
#20985 0x00005643d0f71ffa in fmgr_sql (...) at functions.c:1196
#20986 0x00005643d0f5a6db in ExecInterpExpr (...) at execExprInterp.c:734
#20987 0x00005643d0f56aec in ExecInterpExprStillValid (...) at
execExprInterp.c:1870
#20988 0x00005643d0f98075 in ExecEvalExprSwitchContext (...) at
../../../src/include/executor/executor.h:355
#20989 ExecProject (...) at ../../../src/include/executor/executor.h:389
#20990 ExecResult (...) at nodeResult.c:136
#20991 0x00005643d0f67781 in ExecProcNodeFirst (...) at execProcnode.c:464
#20992 0x00005643d0f5fd44 in ExecProcNode (...) at execMain.c:1670
#20994 0x00005643d0f5ff07 in standard_ExecutorRun (...) at execMain.c:365
#20995 0x00005643d0f5ffe1 in ExecutorRun (...) at execMain.c:309
#20996 0x00005643d0f701e9 in postquel_getnext (...) at functions.c:895
#20997 0x00005643d0f71ffa in fmgr_sql (fcinfo=0x5643d3614430) at
functions.c:1196
#20998 0x00005643d0f5a6db in ExecInterpExpr (...) at execExprInterp.c:734
#20999 0x00005643d0f56aec in ExecInterpExprStillValid (...) at
execExprInterp.c:1870
#21000 0x00005643d0f98075 in ExecEvalExprSwitchContext (...) at
../../../src/include/executor/executor.h:355
#21001 ExecProject (...) at ../../../src/include/executor/executor.h:389
#21002 ExecResult (...) at nodeResult.c:136
#21003 0x00005643d0f67781 in ExecProcNodeFirst (...) at execProcnode.c:464
#21004 0x00005643d0f5fd44 in ExecProcNode (...) at
../../../src/include/executor/executor.h:273
...
(gdb) p $rsp
$1 = (void *) 0x7ffdafcb6000
(gdb) x $rsp
0x7ffdafcb6000: 0xe92636c0
(gdb) x $rsp - 8
0x7ffdafcb5ff8: Cannot access memory at address 0x7ffdafcb5ff8

postmaster.log contains:
TopMemoryContext: 196064 total in 7 blocks; 45920 free (17 chunks); 150144
used
...
  TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
    PortalContext: 1024 total in 1 blocks; 592 free (0 chunks); 432 used:
<unnamed>
      ExecutorState: 8192 total in 1 blocks; 4032 free (0 chunks); 4160
used
        SQL function: 32832 total in 3 blocks; 5136 free (1 chunks); 27696
used: infinite_recurse
          ExecutorState: 8192 total in 1 blocks; 5072 free (0 chunks); 3120
used
            SQL function: 32832 total in 3 blocks; 5136 free (1 chunks);
27696 used: infinite_recurse
...

(Initially observed with the natural restrictions on 32-bit OS.)


PG Bug reporting form <noreply@postgresql.org> writes:
> When a backend with deeply nested memory contexts hits out-of-memory
> condition and logs the contexts, it might lead to a segmentation fault
> (due to the lack of free memory again).

Hmph.  That's not an out-of-memory crash, that's a stack-too-deep
crash.

Seems like we ought to do one or both of these:

1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal.

2. Teach MemoryContextStatsInternal to refuse to recurse more
than N levels, for N perhaps around 100.

Neither of these are very attractive though, as they'd obscure
the OOM situation that we're trying to help debug.

It strikes me that we don't actually need recursion in order to
traverse the context tree: since the nodes have parent pointers,
it'd be possible to visit them all using only iteration.  The
recursion seems necessary though to manage the child summarization
logic as we have it (in particular, we must have a local_totals
per level to produce summarization like this).  Maybe we could
modify solution #2 into

2a. Once we get more than say 100 levels deep, summarize everything
below that in a single line, obtained in an iterative rather than
recursive traversal.

I wonder whether MemoryContextDelete and other cleanup methods
also need to be rewritten to avoid recursion.  In the infinite_recurse
test case I think we escape trouble because we longjmp out of most
of the stack before we try to clean up --- but you could probably
devise a test case that tries to do a subtransaction abort at a
deep call level, and then maybe kaboom?

            regards, tom lane



Hello Tom,

02.03.2024 19:11, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> When a backend with deeply nested memory contexts hits out-of-memory
>> condition and logs the contexts, it might lead to a segmentation fault
>> (due to the lack of free memory again).
> Hmph.  That's not an out-of-memory crash, that's a stack-too-deep
> crash.

I tried to decrease the limit and still got the failure (with the much
shorter stack):
ulimit -Sv 200000; TESTS=infinite_recurse make -s check-tests

(gdb) p $rsp
$1 = (void *) 0x7ffcc83d4ff0
(gdb) frame 13269
#13269 0x000056289bc2685a in main (argc=8, argv=0x56289d3b4930) at main.c:198
198                     PostmasterMain(argc, argv);
(gdb) p $rsp
$2 = (void *) 0x7ffcc84834d0
(gdb) p $rsp - 0x7ffcc83d4ff0
$3 = (void *) 0xae4e0

(Far less than ulimit -s == 8 MB.)

It made me think that it's not a stack overflow issue, but may be I miss
something.

> Seems like we ought to do one or both of these:
>
> 1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal.
>
> 2. Teach MemoryContextStatsInternal to refuse to recurse more
> than N levels, for N perhaps around 100.
>
> Neither of these are very attractive though, as they'd obscure
> the OOM situation that we're trying to help debug.
>
> It strikes me that we don't actually need recursion in order to
> traverse the context tree: since the nodes have parent pointers,
> it'd be possible to visit them all using only iteration.  The
> recursion seems necessary though to manage the child summarization
> logic as we have it (in particular, we must have a local_totals
> per level to produce summarization like this).  Maybe we could
> modify solution #2 into
>
> 2a. Once we get more than say 100 levels deep, summarize everything
> below that in a single line, obtained in an iterative rather than
> recursive traversal.
>
> I wonder whether MemoryContextDelete and other cleanup methods
> also need to be rewritten to avoid recursion.  In the infinite_recurse
> test case I think we escape trouble because we longjmp out of most
> of the stack before we try to clean up --- but you could probably
> devise a test case that tries to do a subtransaction abort at a
> deep call level, and then maybe kaboom?

Exploiting and protecting MemoryContextStatsInternal() were discussed
before:
https://www.postgresql.org/message-id/flat/1661334672.728714027%40f473.i.mail.ru
(It looks like the function got no stack-overflow protection at the end.)

But I'm still not sure that we deal here with the same issue.

Best regards,
Alexander



Alexander Lakhin <exclusion@gmail.com> writes:
> 02.03.2024 19:11, Tom Lane wrote:
>> Hmph.  That's not an out-of-memory crash, that's a stack-too-deep
>> crash.

> (gdb) p $rsp
> $1 = (void *) 0x7ffcc83d4ff0
> (gdb) frame 13269
> #13269 0x000056289bc2685a in main (argc=8, argv=0x56289d3b4930) at main.c:198
> 198                     PostmasterMain(argc, argv);
> (gdb) p $rsp
> $2 = (void *) 0x7ffcc84834d0
> (gdb) p $rsp - 0x7ffcc83d4ff0
> $3 = (void *) 0xae4e0

> (Far less than ulimit -s == 8 MB.)

Yeah, I'm seeing something similar, also with ulimit -s = 8192 kbytes:

(gdb) i reg
...
rbp            0xb0a324            0xb0a324
rsp            0x7ffd07ce4fd0      0x7ffd07ce4fd0
...
(gdb) x/64 0x7ffd07ce4fd0
0x7ffd07ce4fd0: Cannot access memory at address 0x7ffd07ce4fd0

So it's definitely out-of-stack, yet

(gdb) p stack_base_ptr
$3 = 0x7ffd07dbf570 "\b"
(gdb) p 0x7ffd07dbf570 - 0x7ffd07ce4fd0
$4 = 894368

I'd have expected a diff in the vicinity of 8MB, but it isn't.

I think what must be happening is that the kernel is refusing
to expand our stack any more once we've hit the "ulimit -v" limit.
This is quite nasty, because it breaks all our assumptions about
having X amount of stack still available once check_stack_depth
triggers.

I tried inserting check_stack_depth() into MemoryContextStatsInternal,
and *that did not stop the crash*, confirming that we don't think
we're anywhere near the stack limit.  Ugh.

            regards, tom lane



I wrote:
> I think what must be happening is that the kernel is refusing
> to expand our stack any more once we've hit the "ulimit -v" limit.
> This is quite nasty, because it breaks all our assumptions about
> having X amount of stack still available once check_stack_depth
> triggers.

I find this in [1]:

  The C language stack growth does an implicit mremap. If you want absolute
  guarantees and run close to the edge you MUST mmap your stack for the 
  largest size you think you will need. For typical stack usage this does
  not matter much but it's a corner case if you really really care

Seems like we need to do some more work at startup to enforce that
we have the amount of stack we think we do, if we're on Linux.

            regards, tom lane

[1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting



I wrote:
> I find this in [1]:
> 
>   The C language stack growth does an implicit mremap. If you want absolute
>   guarantees and run close to the edge you MUST mmap your stack for the 
>   largest size you think you will need. For typical stack usage this does
>   not matter much but it's a corner case if you really really care
> 
> Seems like we need to do some more work at startup to enforce that
> we have the amount of stack we think we do, if we're on Linux.

After thinking about that some more, I'm really quite unenthused about
trying to remap the stack for ourselves.  It'd be both platform- and
architecture-dependent, and I'm afraid it'd introduce as many failure
modes as it removes.  (Notably, I'm not sure we could guarantee
there's a guard page below the stack.)  Since we've not seen reports
of this failure from the wild, I doubt it's worth the trouble.

I do think it's probably worth reducing MemoryContextDelete's stack
usage to O(1), just to ensure we can't get into stack trouble during
transaction abort.  That's not hard at all, as attached.

I tried to make MemoryContextResetChildren work similarly, but that
doesn't work because if we're not removing child contexts then we
need extra state to tell which ones we've done already.  For the
same reason my idea for bounding the stack space needed by
MemoryContextStats doesn't seem to work.  We could possibly make it
work if we were willing to add a temporary-use pointer field to all
MemoryContext headers, but I'm unconvinced that'd be a good tradeoff.

            regards, tom lane

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 41f2390fb8..7cc220bca0 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -447,14 +447,37 @@ MemoryContextDelete(MemoryContext context)
 void
 MemoryContextDeleteChildren(MemoryContext context)
 {
+    MemoryContext cur,
+                next;
+
     Assert(MemoryContextIsValid(context));
 
     /*
-     * MemoryContextDelete will delink the child from me, so just iterate as
-     * long as there is a child.
+     * We iterate rather than recursing, so that cleaning up a deep nest of
+     * contexts doesn't require unbounded stack space.  (This avoids possible
+     * failure during transaction cleanup, which would be bad.)  This works
+     * because by the time we traverse back up to a parent context, it will
+     * have no remaining children and will be seen as deletable.
      */
-    while (context->firstchild != NULL)
-        MemoryContextDelete(context->firstchild);
+    cur = context->firstchild;
+    while (cur != NULL)
+    {
+        /* Descend until we find a childless context */
+        if (cur->firstchild != NULL)
+        {
+            cur = cur->firstchild;
+            continue;
+        }
+        /* We can delete cur, but first remember the next deletion target */
+        if (cur->nextchild != NULL)
+            next = cur->nextchild;
+        else if (cur->parent != context)
+            next = cur->parent;
+        else
+            next = NULL;
+        MemoryContextDelete(cur);
+        cur = next;
+    }
 }
 
 /*

Hello Tom,

04.03.2024 00:39, Tom Lane wrote:
> I wrote:
>> I find this in [1]:
>>
>>    The C language stack growth does an implicit mremap. If you want absolute
>>    guarantees and run close to the edge you MUST mmap your stack for the
>>    largest size you think you will need. For typical stack usage this does
>>    not matter much but it's a corner case if you really really care
>>
>> Seems like we need to do some more work at startup to enforce that
>> we have the amount of stack we think we do, if we're on Linux.
> After thinking about that some more, I'm really quite unenthused about
> trying to remap the stack for ourselves.  It'd be both platform- and
> architecture-dependent, and I'm afraid it'd introduce as many failure
> modes as it removes.  (Notably, I'm not sure we could guarantee
> there's a guard page below the stack.)  Since we've not seen reports
> of this failure from the wild, I doubt it's worth the trouble.

I have perhaps a naive idea, but it apparently eliminates the segmentation
fault for the given test case. (Please look at a quick draft attached.)

Though maybe the issue can really wait for complaints from outside or for
a simpler/cheaper solution, integrated with the existing (or future)
stack-overflow protection.

> I do think it's probably worth reducing MemoryContextDelete's stack
> usage to O(1), just to ensure we can't get into stack trouble during
> transaction abort.  That's not hard at all, as attached.

Yeah, Heikki proposed something similar as part of
0003-Avoid-recursion-in-MemoryContext-functions.patch there:
https://www.postgresql.org/message-id/6b48c746-9704-46dc-b9be-01fe4137c824%40iki.fi

> I tried to make MemoryContextResetChildren work similarly, but that
> doesn't work because if we're not removing child contexts then we
> need extra state to tell which ones we've done already.  For the
> same reason my idea for bounding the stack space needed by
> MemoryContextStats doesn't seem to work.  We could possibly make it
> work if we were willing to add a temporary-use pointer field to all
> MemoryContext headers, but I'm unconvinced that'd be a good tradeoff.

Best regards,
Alexander
Вложения
04.03.2024 18:00, Alexander Lakhin wrote:
>
> 04.03.2024 00:39, Tom Lane wrote:
>>
>>> Seems like we need to do some more work at startup to enforce that
>>> we have the amount of stack we think we do, if we're on Linux.
>> After thinking about that some more, I'm really quite unenthused about
>> trying to remap the stack for ourselves.  It'd be both platform- and
>> architecture-dependent, and I'm afraid it'd introduce as many failure
>> modes as it removes.  (Notably, I'm not sure we could guarantee
>> there's a guard page below the stack.)  Since we've not seen reports
>> of this failure from the wild, I doubt it's worth the trouble.
>

I've discovered that the out-of-stack segfault can be reached without
hitting an ordinary OOM condition at all.

Please look at the demo script that finds -v value for the given sql
recursion (the predefined range works for a server built with
CPPFLAGS="-Og" ./configure --enable-cassert --enable-debug,
using gcc 11.3 on 64-bit Ubuntu):
for v in `seq 240000 100 260000`; do
echo "limit -v: $v"
ulimit -Sv $v
rm server.log

pg_ctl -l server.log start
dropdb test
createdb test

cat << 'EOF' | psql test >psql.log 2>&1
create function explainer(text) returns setof text
language plpgsql as
$$
declare
   ln text;
   begin
     for ln in execute format('explain analyze %s', $1)
     loop
       return next ln;
     end loop;
   end;
$$;

prepare stmt as select explainer('execute stmt');
select explainer('execute stmt');
EOF

pg_ctl stop || break
grep 'was terminated by signal 11' server.log && break;
done

This script fails for me as follows:
limit -v: 241100
waiting for server to start.... done
server started
waiting for server to shut down.......... done
server stopped
2024-03-06 14:45:26.882 UTC [38567] LOG:  server process (PID 38634) was terminated by signal 11: Segmentation fault
(with no out-of-memory errors in the server.log)

Core was generated by `postgres: law test [local] SELECT                                             '.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/38634' in core file too small.
#0  0x0000563ea8af7d60 in base_yyparse (yyscanner=yyscanner@entry=0x563eacbeec38) at gram.c:29020
bt
29020   {
(gdb) bt
#0  0x0000563ea8af7d60 in base_yyparse (yyscanner=yyscanner@entry=0x563eacbeec38) at gram.c:29020
#1  0x0000563ea8b37d4e in raw_parser (str=str@entry=0x563eacbf2c58 "explain analyze execute stmt", mode=<optimized
out>)
 
at parser.c:77
#2  0x0000563ea8c19217 in _SPI_prepare_plan (src=src@entry=0x563eacbf2c58 "explain analyze execute stmt", 
plan=plan@entry=0x7fffb16323b0) at spi.c:2235
#3  0x0000563ea8c1c5f5 in SPI_cursor_parse_open (name=name@entry=0x0, src=src@entry=0x563eacbf2c58 "explain analyze 
execute stmt", options=options@entry=0x7fffb1632450) at spi.c:1554
...
#11389 0x0000563ea8d8bbf2 in exec_simple_query (query_string=query_string@entry=0x563eaa5c6328 "select 
explainer('execute stmt');") at postgres.c:1273
#11390 0x0000563ea8d8daae in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4675
#11391 0x0000563ea8cf9c5d in BackendRun (port=port@entry=0x563eaa5f38c0) at postmaster.c:4475
#11392 0x0000563ea8cfcc10 in BackendStartup (port=port@entry=0x563eaa5f38c0) at postmaster.c:4151
#11393 0x0000563ea8cfcdae in ServerLoop () at postmaster.c:1769
#11394 0x0000563ea8cfe120 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x563eaa5c1760) at postmaster.c:1468
#11395 0x0000563ea8c3148d in main (argc=3, argv=0x563eaa5c1760) at main.c:197

(gdb) i reg
rbp            0x563eacbeec38      0x563eacbeec38
rsp            0x7fffb16316b0      0x7fffb16316b0

(gdb) x/4 0x7fffb1631ff0
0x7fffb1631ff0: Cannot access memory at address 0x7fffb1631ff0
(gdb) x/4 0x7fffb1632000
0x7fffb1632000: 0       0       0       0

(gdb)  p stack_base_ptr
$1 = 0x7fffb17ac660 "\001"
(gdb) p stack_base_ptr - $rsp
$2 = 1552304

So it looks like a very specific corner stack-overflow case.

Best regards,
Alexander



Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault

От
Alexander Korotkov
Дата:
On Sun, Mar 3, 2024 at 11:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I do think it's probably worth reducing MemoryContextDelete's stack
> usage to O(1), just to ensure we can't get into stack trouble during
> transaction abort.  That's not hard at all, as attached.
>
> I tried to make MemoryContextResetChildren work similarly, but that
> doesn't work because if we're not removing child contexts then we
> need extra state to tell which ones we've done already.  For the
> same reason my idea for bounding the stack space needed by
> MemoryContextStats doesn't seem to work.  We could possibly make it
> work if we were willing to add a temporary-use pointer field to all
> MemoryContext headers, but I'm unconvinced that'd be a good tradeoff.

For removing recursion from memory context processing, please check
the patch by Heikki [1], and my slightly revised version [2].

Links.
1. https://www.postgresql.org/message-id/6b48c746-9704-46dc-b9be-01fe4137c824%40iki.fi
2. https://www.postgresql.org/message-id/CAPpHfdtQVzkKgrxqKZE9yESnu7cAATVQbGktVOSXPNWG7GOkhA%40mail.gmail.com

------
Regards,
Alexander Korotkov



Alexander Korotkov <aekorotkov@gmail.com> writes:
> For removing recursion from memory context processing, please check
> the patch by Heikki [1], and my slightly revised version [2].

> Links.
> 1. https://www.postgresql.org/message-id/6b48c746-9704-46dc-b9be-01fe4137c824%40iki.fi
> 2. https://www.postgresql.org/message-id/CAPpHfdtQVzkKgrxqKZE9yESnu7cAATVQbGktVOSXPNWG7GOkhA%40mail.gmail.com

I'd forgotten about that thread.  I'll comment there, thanks!

The issue Alexander has identified about the kernel possibly not
giving us as much stack as promised should stay in this thread,
but we can discuss the idea of de-recursing mcxt.c over there.

            regards, tom lane