Обсуждение: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 18374 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16.2 Operating system: Ubuntu 22.04 Description: When a backend with deeply nested memory contexts hits out-of-memory condition and logs the contexts, it might lead to a segmentation fault (due to the lack of free memory again). For example: $ ulimit -Sv 300000; TESTS=infinite_recurse make -s check-tests (on 64-bit Ubuntu 22.04) fails: # +++ regress check in src/test/regress +++ # using temp instance on port 61698 with PID 809399 not ok 1 - infinite_recurse 286 ms # (test process exited with exit code 2) with the following stack trace: Core was generated by `postgres: law regression [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. warning: Section `.reg-xstate/809680' in core file too small. #0 0x00005643d12ecd50 in dostr (str=str@entry=0x5643d145b137 " ", slen=2, target=target@entry=0x7ffdafcb6290) at snprintf.c:1378 1378 { (gdb) bt #0 0x00005643d12ecd50 in dostr (str=str@entry=0x5643d145b137 " ", slen=2, target=target@entry=0x7ffdafcb6290) at snprintf.c:1378 #1 0x00005643d12ed54a in dopr (...) at snprintf.c:417 #2 0x00005643d12edec2 in pg_vfprintf (...) at snprintf.c:257 #3 0x00005643d12edfa7 in pg_fprintf (...) at snprintf.c:270 #4 0x00005643d12c086e in MemoryContextStatsPrint (..., stats_string=stats_string@entry=0x7ffdafcb68c0 "8192 total in 1 blocks; 5072 free (0 chunks); 3120 used", print_to_stderr=print_to_stderr@entry=true) at mcxt.c:909 #5 0x00005643d12b9604 in AllocSetStats (...) at aset.c:1508 #6 0x00005643d12c0553 in MemoryContextStatsInternal (...) at mcxt.c:770 ... #675 0x00005643d12c05ee in MemoryContextStatsInternal (...) at mcxt.c:786 #676 0x00005643d12c1030 in MemoryContextStatsDetail (...) at mcxt.c:721 #677 0x00005643d12c1111 in MemoryContextStats (...) at mcxt.c:702 #678 0x00005643d12c19d9 in palloc (size=size@entry=16384) at mcxt.c:1243 #679 0x00005643d12d2ed8 in tuplestore_begin_common (eflags=4, interXact=interXact@entry=false, maxKBytes=4096) at tuplestore.c:281 #680 0x00005643d12d37c9 in tuplestore_begin_heap (...) at tuplestore.c:331 #681 0x00005643d0f72099 in fmgr_sql (...) at functions.c:1142 ... #20980 0x00005643d0f5fd44 in ExecProcNode (...) at ../../../src/include/executor/executor.h:273 #20981 ExecutePlan (...) at execMain.c:1670 #20982 0x00005643d0f5ff07 in standard_ExecutorRun (...) at execMain.c:365 #20983 0x00005643d0f5ffe1 in ExecutorRun (...) at execMain.c:309 #20984 0x00005643d0f701e9 in postquel_getnext (...) at functions.c:895 #20985 0x00005643d0f71ffa in fmgr_sql (...) at functions.c:1196 #20986 0x00005643d0f5a6db in ExecInterpExpr (...) at execExprInterp.c:734 #20987 0x00005643d0f56aec in ExecInterpExprStillValid (...) at execExprInterp.c:1870 #20988 0x00005643d0f98075 in ExecEvalExprSwitchContext (...) at ../../../src/include/executor/executor.h:355 #20989 ExecProject (...) at ../../../src/include/executor/executor.h:389 #20990 ExecResult (...) at nodeResult.c:136 #20991 0x00005643d0f67781 in ExecProcNodeFirst (...) at execProcnode.c:464 #20992 0x00005643d0f5fd44 in ExecProcNode (...) at execMain.c:1670 #20994 0x00005643d0f5ff07 in standard_ExecutorRun (...) at execMain.c:365 #20995 0x00005643d0f5ffe1 in ExecutorRun (...) at execMain.c:309 #20996 0x00005643d0f701e9 in postquel_getnext (...) at functions.c:895 #20997 0x00005643d0f71ffa in fmgr_sql (fcinfo=0x5643d3614430) at functions.c:1196 #20998 0x00005643d0f5a6db in ExecInterpExpr (...) at execExprInterp.c:734 #20999 0x00005643d0f56aec in ExecInterpExprStillValid (...) at execExprInterp.c:1870 #21000 0x00005643d0f98075 in ExecEvalExprSwitchContext (...) at ../../../src/include/executor/executor.h:355 #21001 ExecProject (...) at ../../../src/include/executor/executor.h:389 #21002 ExecResult (...) at nodeResult.c:136 #21003 0x00005643d0f67781 in ExecProcNodeFirst (...) at execProcnode.c:464 #21004 0x00005643d0f5fd44 in ExecProcNode (...) at ../../../src/include/executor/executor.h:273 ... (gdb) p $rsp $1 = (void *) 0x7ffdafcb6000 (gdb) x $rsp 0x7ffdafcb6000: 0xe92636c0 (gdb) x $rsp - 8 0x7ffdafcb5ff8: Cannot access memory at address 0x7ffdafcb5ff8 postmaster.log contains: TopMemoryContext: 196064 total in 7 blocks; 45920 free (17 chunks); 150144 used ... TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used PortalContext: 1024 total in 1 blocks; 592 free (0 chunks); 432 used: <unnamed> ExecutorState: 8192 total in 1 blocks; 4032 free (0 chunks); 4160 used SQL function: 32832 total in 3 blocks; 5136 free (1 chunks); 27696 used: infinite_recurse ExecutorState: 8192 total in 1 blocks; 5072 free (0 chunks); 3120 used SQL function: 32832 total in 3 blocks; 5136 free (1 chunks); 27696 used: infinite_recurse ... (Initially observed with the natural restrictions on 32-bit OS.)
PG Bug reporting form <noreply@postgresql.org> writes: > When a backend with deeply nested memory contexts hits out-of-memory > condition and logs the contexts, it might lead to a segmentation fault > (due to the lack of free memory again). Hmph. That's not an out-of-memory crash, that's a stack-too-deep crash. Seems like we ought to do one or both of these: 1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal. 2. Teach MemoryContextStatsInternal to refuse to recurse more than N levels, for N perhaps around 100. Neither of these are very attractive though, as they'd obscure the OOM situation that we're trying to help debug. It strikes me that we don't actually need recursion in order to traverse the context tree: since the nodes have parent pointers, it'd be possible to visit them all using only iteration. The recursion seems necessary though to manage the child summarization logic as we have it (in particular, we must have a local_totals per level to produce summarization like this). Maybe we could modify solution #2 into 2a. Once we get more than say 100 levels deep, summarize everything below that in a single line, obtained in an iterative rather than recursive traversal. I wonder whether MemoryContextDelete and other cleanup methods also need to be rewritten to avoid recursion. In the infinite_recurse test case I think we escape trouble because we longjmp out of most of the stack before we try to clean up --- but you could probably devise a test case that tries to do a subtransaction abort at a deep call level, and then maybe kaboom? regards, tom lane
Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
От
Alexander Lakhin
Дата:
Hello Tom, 02.03.2024 19:11, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> When a backend with deeply nested memory contexts hits out-of-memory >> condition and logs the contexts, it might lead to a segmentation fault >> (due to the lack of free memory again). > Hmph. That's not an out-of-memory crash, that's a stack-too-deep > crash. I tried to decrease the limit and still got the failure (with the much shorter stack): ulimit -Sv 200000; TESTS=infinite_recurse make -s check-tests (gdb) p $rsp $1 = (void *) 0x7ffcc83d4ff0 (gdb) frame 13269 #13269 0x000056289bc2685a in main (argc=8, argv=0x56289d3b4930) at main.c:198 198 PostmasterMain(argc, argv); (gdb) p $rsp $2 = (void *) 0x7ffcc84834d0 (gdb) p $rsp - 0x7ffcc83d4ff0 $3 = (void *) 0xae4e0 (Far less than ulimit -s == 8 MB.) It made me think that it's not a stack overflow issue, but may be I miss something. > Seems like we ought to do one or both of these: > > 1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal. > > 2. Teach MemoryContextStatsInternal to refuse to recurse more > than N levels, for N perhaps around 100. > > Neither of these are very attractive though, as they'd obscure > the OOM situation that we're trying to help debug. > > It strikes me that we don't actually need recursion in order to > traverse the context tree: since the nodes have parent pointers, > it'd be possible to visit them all using only iteration. The > recursion seems necessary though to manage the child summarization > logic as we have it (in particular, we must have a local_totals > per level to produce summarization like this). Maybe we could > modify solution #2 into > > 2a. Once we get more than say 100 levels deep, summarize everything > below that in a single line, obtained in an iterative rather than > recursive traversal. > > I wonder whether MemoryContextDelete and other cleanup methods > also need to be rewritten to avoid recursion. In the infinite_recurse > test case I think we escape trouble because we longjmp out of most > of the stack before we try to clean up --- but you could probably > devise a test case that tries to do a subtransaction abort at a > deep call level, and then maybe kaboom? Exploiting and protecting MemoryContextStatsInternal() were discussed before: https://www.postgresql.org/message-id/flat/1661334672.728714027%40f473.i.mail.ru (It looks like the function got no stack-overflow protection at the end.) But I'm still not sure that we deal here with the same issue. Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > 02.03.2024 19:11, Tom Lane wrote: >> Hmph. That's not an out-of-memory crash, that's a stack-too-deep >> crash. > (gdb) p $rsp > $1 = (void *) 0x7ffcc83d4ff0 > (gdb) frame 13269 > #13269 0x000056289bc2685a in main (argc=8, argv=0x56289d3b4930) at main.c:198 > 198 PostmasterMain(argc, argv); > (gdb) p $rsp > $2 = (void *) 0x7ffcc84834d0 > (gdb) p $rsp - 0x7ffcc83d4ff0 > $3 = (void *) 0xae4e0 > (Far less than ulimit -s == 8 MB.) Yeah, I'm seeing something similar, also with ulimit -s = 8192 kbytes: (gdb) i reg ... rbp 0xb0a324 0xb0a324 rsp 0x7ffd07ce4fd0 0x7ffd07ce4fd0 ... (gdb) x/64 0x7ffd07ce4fd0 0x7ffd07ce4fd0: Cannot access memory at address 0x7ffd07ce4fd0 So it's definitely out-of-stack, yet (gdb) p stack_base_ptr $3 = 0x7ffd07dbf570 "\b" (gdb) p 0x7ffd07dbf570 - 0x7ffd07ce4fd0 $4 = 894368 I'd have expected a diff in the vicinity of 8MB, but it isn't. I think what must be happening is that the kernel is refusing to expand our stack any more once we've hit the "ulimit -v" limit. This is quite nasty, because it breaks all our assumptions about having X amount of stack still available once check_stack_depth triggers. I tried inserting check_stack_depth() into MemoryContextStatsInternal, and *that did not stop the crash*, confirming that we don't think we're anywhere near the stack limit. Ugh. regards, tom lane
I wrote: > I think what must be happening is that the kernel is refusing > to expand our stack any more once we've hit the "ulimit -v" limit. > This is quite nasty, because it breaks all our assumptions about > having X amount of stack still available once check_stack_depth > triggers. I find this in [1]: The C language stack growth does an implicit mremap. If you want absolute guarantees and run close to the edge you MUST mmap your stack for the largest size you think you will need. For typical stack usage this does not matter much but it's a corner case if you really really care Seems like we need to do some more work at startup to enforce that we have the amount of stack we think we do, if we're on Linux. regards, tom lane [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
I wrote: > I find this in [1]: > > The C language stack growth does an implicit mremap. If you want absolute > guarantees and run close to the edge you MUST mmap your stack for the > largest size you think you will need. For typical stack usage this does > not matter much but it's a corner case if you really really care > > Seems like we need to do some more work at startup to enforce that > we have the amount of stack we think we do, if we're on Linux. After thinking about that some more, I'm really quite unenthused about trying to remap the stack for ourselves. It'd be both platform- and architecture-dependent, and I'm afraid it'd introduce as many failure modes as it removes. (Notably, I'm not sure we could guarantee there's a guard page below the stack.) Since we've not seen reports of this failure from the wild, I doubt it's worth the trouble. I do think it's probably worth reducing MemoryContextDelete's stack usage to O(1), just to ensure we can't get into stack trouble during transaction abort. That's not hard at all, as attached. I tried to make MemoryContextResetChildren work similarly, but that doesn't work because if we're not removing child contexts then we need extra state to tell which ones we've done already. For the same reason my idea for bounding the stack space needed by MemoryContextStats doesn't seem to work. We could possibly make it work if we were willing to add a temporary-use pointer field to all MemoryContext headers, but I'm unconvinced that'd be a good tradeoff. regards, tom lane diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 41f2390fb8..7cc220bca0 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -447,14 +447,37 @@ MemoryContextDelete(MemoryContext context) void MemoryContextDeleteChildren(MemoryContext context) { + MemoryContext cur, + next; + Assert(MemoryContextIsValid(context)); /* - * MemoryContextDelete will delink the child from me, so just iterate as - * long as there is a child. + * We iterate rather than recursing, so that cleaning up a deep nest of + * contexts doesn't require unbounded stack space. (This avoids possible + * failure during transaction cleanup, which would be bad.) This works + * because by the time we traverse back up to a parent context, it will + * have no remaining children and will be seen as deletable. */ - while (context->firstchild != NULL) - MemoryContextDelete(context->firstchild); + cur = context->firstchild; + while (cur != NULL) + { + /* Descend until we find a childless context */ + if (cur->firstchild != NULL) + { + cur = cur->firstchild; + continue; + } + /* We can delete cur, but first remember the next deletion target */ + if (cur->nextchild != NULL) + next = cur->nextchild; + else if (cur->parent != context) + next = cur->parent; + else + next = NULL; + MemoryContextDelete(cur); + cur = next; + } } /*
Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
От
Alexander Lakhin
Дата:
Hello Tom, 04.03.2024 00:39, Tom Lane wrote: > I wrote: >> I find this in [1]: >> >> The C language stack growth does an implicit mremap. If you want absolute >> guarantees and run close to the edge you MUST mmap your stack for the >> largest size you think you will need. For typical stack usage this does >> not matter much but it's a corner case if you really really care >> >> Seems like we need to do some more work at startup to enforce that >> we have the amount of stack we think we do, if we're on Linux. > After thinking about that some more, I'm really quite unenthused about > trying to remap the stack for ourselves. It'd be both platform- and > architecture-dependent, and I'm afraid it'd introduce as many failure > modes as it removes. (Notably, I'm not sure we could guarantee > there's a guard page below the stack.) Since we've not seen reports > of this failure from the wild, I doubt it's worth the trouble. I have perhaps a naive idea, but it apparently eliminates the segmentation fault for the given test case. (Please look at a quick draft attached.) Though maybe the issue can really wait for complaints from outside or for a simpler/cheaper solution, integrated with the existing (or future) stack-overflow protection. > I do think it's probably worth reducing MemoryContextDelete's stack > usage to O(1), just to ensure we can't get into stack trouble during > transaction abort. That's not hard at all, as attached. Yeah, Heikki proposed something similar as part of 0003-Avoid-recursion-in-MemoryContext-functions.patch there: https://www.postgresql.org/message-id/6b48c746-9704-46dc-b9be-01fe4137c824%40iki.fi > I tried to make MemoryContextResetChildren work similarly, but that > doesn't work because if we're not removing child contexts then we > need extra state to tell which ones we've done already. For the > same reason my idea for bounding the stack space needed by > MemoryContextStats doesn't seem to work. We could possibly make it > work if we were willing to add a temporary-use pointer field to all > MemoryContext headers, but I'm unconvinced that'd be a good tradeoff. Best regards, Alexander
Вложения
Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
От
Alexander Lakhin
Дата:
04.03.2024 18:00, Alexander Lakhin wrote: > > 04.03.2024 00:39, Tom Lane wrote: >> >>> Seems like we need to do some more work at startup to enforce that >>> we have the amount of stack we think we do, if we're on Linux. >> After thinking about that some more, I'm really quite unenthused about >> trying to remap the stack for ourselves. It'd be both platform- and >> architecture-dependent, and I'm afraid it'd introduce as many failure >> modes as it removes. (Notably, I'm not sure we could guarantee >> there's a guard page below the stack.) Since we've not seen reports >> of this failure from the wild, I doubt it's worth the trouble. > I've discovered that the out-of-stack segfault can be reached without hitting an ordinary OOM condition at all. Please look at the demo script that finds -v value for the given sql recursion (the predefined range works for a server built with CPPFLAGS="-Og" ./configure --enable-cassert --enable-debug, using gcc 11.3 on 64-bit Ubuntu): for v in `seq 240000 100 260000`; do echo "limit -v: $v" ulimit -Sv $v rm server.log pg_ctl -l server.log start dropdb test createdb test cat << 'EOF' | psql test >psql.log 2>&1 create function explainer(text) returns setof text language plpgsql as $$ declare ln text; begin for ln in execute format('explain analyze %s', $1) loop return next ln; end loop; end; $$; prepare stmt as select explainer('execute stmt'); select explainer('execute stmt'); EOF pg_ctl stop || break grep 'was terminated by signal 11' server.log && break; done This script fails for me as follows: limit -v: 241100 waiting for server to start.... done server started waiting for server to shut down.......... done server stopped 2024-03-06 14:45:26.882 UTC [38567] LOG: server process (PID 38634) was terminated by signal 11: Segmentation fault (with no out-of-memory errors in the server.log) Core was generated by `postgres: law test [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. warning: Section `.reg-xstate/38634' in core file too small. #0 0x0000563ea8af7d60 in base_yyparse (yyscanner=yyscanner@entry=0x563eacbeec38) at gram.c:29020 bt 29020 { (gdb) bt #0 0x0000563ea8af7d60 in base_yyparse (yyscanner=yyscanner@entry=0x563eacbeec38) at gram.c:29020 #1 0x0000563ea8b37d4e in raw_parser (str=str@entry=0x563eacbf2c58 "explain analyze execute stmt", mode=<optimized out>) at parser.c:77 #2 0x0000563ea8c19217 in _SPI_prepare_plan (src=src@entry=0x563eacbf2c58 "explain analyze execute stmt", plan=plan@entry=0x7fffb16323b0) at spi.c:2235 #3 0x0000563ea8c1c5f5 in SPI_cursor_parse_open (name=name@entry=0x0, src=src@entry=0x563eacbf2c58 "explain analyze execute stmt", options=options@entry=0x7fffb1632450) at spi.c:1554 ... #11389 0x0000563ea8d8bbf2 in exec_simple_query (query_string=query_string@entry=0x563eaa5c6328 "select explainer('execute stmt');") at postgres.c:1273 #11390 0x0000563ea8d8daae in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4675 #11391 0x0000563ea8cf9c5d in BackendRun (port=port@entry=0x563eaa5f38c0) at postmaster.c:4475 #11392 0x0000563ea8cfcc10 in BackendStartup (port=port@entry=0x563eaa5f38c0) at postmaster.c:4151 #11393 0x0000563ea8cfcdae in ServerLoop () at postmaster.c:1769 #11394 0x0000563ea8cfe120 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x563eaa5c1760) at postmaster.c:1468 #11395 0x0000563ea8c3148d in main (argc=3, argv=0x563eaa5c1760) at main.c:197 (gdb) i reg rbp 0x563eacbeec38 0x563eacbeec38 rsp 0x7fffb16316b0 0x7fffb16316b0 (gdb) x/4 0x7fffb1631ff0 0x7fffb1631ff0: Cannot access memory at address 0x7fffb1631ff0 (gdb) x/4 0x7fffb1632000 0x7fffb1632000: 0 0 0 0 (gdb) p stack_base_ptr $1 = 0x7fffb17ac660 "\001" (gdb) p stack_base_ptr - $rsp $2 = 1552304 So it looks like a very specific corner stack-overflow case. Best regards, Alexander
Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
От
Alexander Korotkov
Дата:
On Sun, Mar 3, 2024 at 11:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I do think it's probably worth reducing MemoryContextDelete's stack > usage to O(1), just to ensure we can't get into stack trouble during > transaction abort. That's not hard at all, as attached. > > I tried to make MemoryContextResetChildren work similarly, but that > doesn't work because if we're not removing child contexts then we > need extra state to tell which ones we've done already. For the > same reason my idea for bounding the stack space needed by > MemoryContextStats doesn't seem to work. We could possibly make it > work if we were willing to add a temporary-use pointer field to all > MemoryContext headers, but I'm unconvinced that'd be a good tradeoff. For removing recursion from memory context processing, please check the patch by Heikki [1], and my slightly revised version [2]. Links. 1. https://www.postgresql.org/message-id/6b48c746-9704-46dc-b9be-01fe4137c824%40iki.fi 2. https://www.postgresql.org/message-id/CAPpHfdtQVzkKgrxqKZE9yESnu7cAATVQbGktVOSXPNWG7GOkhA%40mail.gmail.com ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > For removing recursion from memory context processing, please check > the patch by Heikki [1], and my slightly revised version [2]. > Links. > 1. https://www.postgresql.org/message-id/6b48c746-9704-46dc-b9be-01fe4137c824%40iki.fi > 2. https://www.postgresql.org/message-id/CAPpHfdtQVzkKgrxqKZE9yESnu7cAATVQbGktVOSXPNWG7GOkhA%40mail.gmail.com I'd forgotten about that thread. I'll comment there, thanks! The issue Alexander has identified about the kernel possibly not giving us as much stack as promised should stay in this thread, but we can discuss the idea of de-recursing mcxt.c over there. regards, tom lane