Обсуждение: can we optimize STACK_DEPTH_SLOP
Poking at NetBSD kernel source it looks like the default ulimit -s depends on the architecture and ranges from 512k to 16M. Postgres insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less than the ulimit setting making it impossible to start up on architectures with a default of 512kB without raising the ulimit. If we could just lower it to 384kB then Postgres would start up but I wonder if we should just use MIN(stack_rlimit/2, STACK _DEPTH_SLOP) so that there's always a setting of max_stack_depth that would allow Postgres to start. ./arch/sun2/include/vmparam.h:73:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/arm/include/arm32/vmparam.h:66:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sun3/include/vmparam3.h:109:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/sun3/include/vmparam3x.h:58:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/luna68k/include/vmparam.h:70:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/hppa/include/vmparam.h:62:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/hp300/include/vmparam.h:82:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/alpha/include/vmparam.h:79:#define DFLSSIZ (1<<21) /* initial stack size (2M) */ ./arch/acorn26/include/vmparam.h:55:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/amd64/include/vmparam.h:83:#define DFLSSIZ (4*1024*1024) /* initial stack size limit */ ./arch/amd64/include/vmparam.h:101:#define DFLSSIZ32 (2*1024*1024) /* initial stack size limit */ ./arch/ia64/include/vmparam.h:57:#define DFLSSIZ (1<<21) /* initial stack size (2M) */ ./arch/mvme68k/include/vmparam.h:82:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/i386/include/vmparam.h:74:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/amiga/include/vmparam.h:82:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sparc/include/vmparam.h:94:#define DFLSSIZ (8*1024*1024) /* initial stack size limit */ ./arch/mips/include/vmparam.h:95:#define DFLSSIZ (4*1024*1024) /* initial stack size limit */ ./arch/mips/include/vmparam.h:114:#define DFLSSIZ (16*1024*1024) /* initial stack size limit */ ./arch/mips/include/vmparam.h:134:#define DFLSSIZ32 DFLTSIZ /* initial stack size limit */ ./arch/sh3/include/vmparam.h:69:#define DFLSSIZ (2 * 1024 * 1024) ./arch/mac68k/include/vmparam.h:115:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/next68k/include/vmparam.h:89:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/news68k/include/vmparam.h:82:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/x68k/include/vmparam.h:74:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/cesfic/include/vmparam.h:82:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/usermode/include/vmparam.h:69:#define DFLSSIZ (2 * 1024 * 1024) ./arch/usermode/include/vmparam.h:78:#define DFLSSIZ (4 * 1024 * 1024) ./arch/powerpc/include/oea/vmparam.h:74:#define DFLSSIZ (2*1024*1024) /* default stack size */ ./arch/powerpc/include/ibm4xx/vmparam.h:60:#define DFLSSIZ (2*1024*1024) /* default stack size */ ./arch/powerpc/include/booke/vmparam.h:75:#define DFLSSIZ (2*1024*1024) /* default stack size */ ./arch/vax/include/vmparam.h:74:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/sparc64/include/vmparam.h:100:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sparc64/include/vmparam.h:125:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sparc64/include/vmparam.h:145:#define DFLSSIZ32 (2*1024*1024) /* initial stack size limit */ ./arch/atari/include/vmparam.h:81:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ -- greg
Greg Stark <stark@mit.edu> writes:
> Poking at NetBSD kernel source it looks like the default ulimit -s
> depends on the architecture and ranges from 512k to 16M. Postgres
> insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less
> than the ulimit setting making it impossible to start up on
> architectures with a default of 512kB without raising the ulimit.
> If we could just lower it to 384kB then Postgres would start up but I
> wonder if we should just use MIN(stack_rlimit/2, STACK
> _DEPTH_SLOP) so that there's always a setting of max_stack_depth that
> would allow Postgres to start.
I'm pretty nervous about reducing that materially without any
investigation into how much of the slop we actually use. Our assumption
so far has generally been that only recursive routines need to have any
stack depth check; but there are plenty of very deep non-recursive call
paths. I do not think we're doing people any favors by letting them skip
fooling with "ulimit -s" if the result is that their database crashes
under stress. For that matter, even if we were sure we'd produce a
"stack too deep" error rather than crashing, that's still not very nice
if it happens on run-of-the-mill queries.
regards, tom lane
On Tue, Jul 5, 2016 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@mit.edu> writes: >> Poking at NetBSD kernel source it looks like the default ulimit -s >> depends on the architecture and ranges from 512k to 16M. Postgres >> insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less >> than the ulimit setting making it impossible to start up on >> architectures with a default of 512kB without raising the ulimit. > >> If we could just lower it to 384kB then Postgres would start up but I >> wonder if we should just use MIN(stack_rlimit/2, STACK >> _DEPTH_SLOP) so that there's always a setting of max_stack_depth that >> would allow Postgres to start. > > I'm pretty nervous about reducing that materially without any > investigation into how much of the slop we actually use. Our assumption > so far has generally been that only recursive routines need to have any > stack depth check; but there are plenty of very deep non-recursive call > paths. I do not think we're doing people any favors by letting them skip > fooling with "ulimit -s" if the result is that their database crashes > under stress. For that matter, even if we were sure we'd produce a > "stack too deep" error rather than crashing, that's still not very nice > if it happens on run-of-the-mill queries. To me it seems like using anything based on stack_rlimit/2 is pretty risky for the reason that you state, but I also think that not being able to start the database at all on some platforms with small stacks is bad. If I had to guess, I'd bet that most functions in the backend use a few hundred bytes of stack space or less, so that even 100kB of stack space is enough for hundreds of stack frames. If we're putting that kind of depth on the stack without ever checking the stack depth, we deserve what we get. That having been said, it wouldn't surprise me to find that we have functions here and there which put objects that are many kB in size on the stack, making it much easier to overrun the available stack space in only a few frames. It would be nice if there were a tool that you could run over your binaries and have it dump out the names of all functions that create large stack frames, but I don't know of one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 5, 2016 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm pretty nervous about reducing that materially without any
>> investigation into how much of the slop we actually use.
> To me it seems like using anything based on stack_rlimit/2 is pretty
> risky for the reason that you state, but I also think that not being
> able to start the database at all on some platforms with small stacks
> is bad.
My point was that this is something we should investigate, not just
guess about.
I did some experimentation using the attached quick-kluge patch, which
(1) causes each exiting server process to report its actual ending stack
size, and (2) hacks the STACK_DEPTH_SLOP test so that you can set
max_stack_depth considerably higher than what rlimit(2) claims.
Unfortunately the way I did (1) only works on systems with pmap; I'm not
sure how to make it more portable.
My results on an x86_64 RHEL6 system were pretty interesting:
1. All but two of the regression test scripts have ending stack sizes
of 188K to 196K. There is one outlier at 296K (most likely the regex
test, though I did not stop to confirm that) and then there's the
errors.sql test, which intentionally provokes a "stack too deep" failure
and will therefore consume approximately max_stack_depth stack if it can.
2. With the RHEL6 default "ulimit -s" setting of 10240kB, you actually
have to increase max_stack_depth to 12275kB before you get a crash in
errors.sql. At the highest passing value, 12274kB, pmap says we end
with
1 00007ffc51f6e000 12284K rw--- [ stack ]
which is just shy of 2MB more than the alleged limit. I conclude that
at least in this kernel version, the kernel doesn't complain until your
stack would be 2MB *more* than the ulimit -s value.
That result also says that at least for that particular test, the
value of STACK_DEPTH_SLOP could be as little as 10K without a crash,
even without this surprising kernel forgiveness. But of course that
test isn't really pushing the slop factor, since it's only compiling a
trivial expression at each recursion depth.
Given these results I definitely wouldn't have a problem with reducing
STACK_DEPTH_SLOP to 200K, and you could possibly talk me down to less.
On x86_64. Other architectures might be more stack-hungry, though.
I'm particularly worried about IA64 --- I wonder if anyone can perform
these same experiments on that?
regards, tom lane
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index cc36b80..7740120 100644
*** a/src/backend/storage/ipc/ipc.c
--- b/src/backend/storage/ipc/ipc.c
*************** static int on_proc_exit_index,
*** 98,106 ****
--- 98,113 ----
void
proc_exit(int code)
{
+ char sysbuf[256];
+
/* Clean up everything that must be cleaned up */
proc_exit_prepare(code);
+ /* report stack size to stderr */
+ snprintf(sysbuf, sizeof(sysbuf), "pmap %d | grep stack 1>&2",
+ (int) getpid());
+ system(sysbuf);
+
#ifdef PROFILE_PID_DIR
{
/*
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 7254355..009bec2 100644
*** a/src/include/tcop/tcopprot.h
--- b/src/include/tcop/tcopprot.h
***************
*** 27,33 ****
/* Required daylight between max_stack_depth and the kernel limit, in bytes */
! #define STACK_DEPTH_SLOP (512 * 1024L)
extern CommandDest whereToSendOutput;
extern PGDLLIMPORT const char *debug_query_string;
--- 27,33 ----
/* Required daylight between max_stack_depth and the kernel limit, in bytes */
! #define STACK_DEPTH_SLOP (-100 * 1024L * 1024L)
extern CommandDest whereToSendOutput;
extern PGDLLIMPORT const char *debug_query_string;
On Tue, Jul 5, 2016 at 8:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Unfortunately the way I did (1) only works on systems with pmap; I'm not > sure how to make it more portable. I did a similar(ish) test which is admittedly not as exhaustive as using pmap. I instrumented check_stack_depth() itself to keep track of a high water mark (and based on Robert's thought process) to keep track of the largest increment over the previous checked stack depth. This doesn't cover any cases where there's no check_stack_depth() call in the call stack at all (but then if there's no check_stack_depth call at all it's hard to see how any setting of STACK_DEPTH_SLOP is necessarily going to help). I see similar results to you. The regexp test shows: LOG: disconnection: highest stack depth: 392256 largest stack increment: 35584 And the: STATEMENT: select infinite_recurse(); LOG: disconnection: highest stack depth: 2097584 largest stack increment: 1936 There were a couple other tests with similar stack increase increments to the regular expression test: STATEMENT: alter table atacc2 add constraint foo check (test>0) no inherit; LOG: disconnection: highest stack depth: 39232 largest stack increment: 34224 STATEMENT: SELECT chr(0); LOG: disconnection: highest stack depth: 44144 largest stack increment: 34512 But aside from those two the next largest increment between two success check_stack_depth calls was about 12kB: STATEMENT: select array_elem_check(121.00); LOG: disconnection: highest stack depth: 24256 largest stack increment: 12896 This was all on x86_64 too. -- greg
Вложения
Greg Stark <stark@mit.edu> writes:
> I did a similar(ish) test which is admittedly not as exhaustive as
> using pmap. I instrumented check_stack_depth() itself to keep track of
> a high water mark (and based on Robert's thought process) to keep
> track of the largest increment over the previous checked stack depth.
> This doesn't cover any cases where there's no check_stack_depth() call
> in the call stack at all (but then if there's no check_stack_depth
> call at all it's hard to see how any setting of STACK_DEPTH_SLOP is
> necessarily going to help).
Well, the point of STACK_DEPTH_SLOP is that we don't want to have to
put check_stack_depth calls in every function in the backend, especially
not otherwise-inexpensive leaf functions. So the idea is for the slop
number to cover the worst-case call graph after the last function with a
check. Your numbers are pretty interesting, in that they clearly prove
we need a slop value of at least 40-50K, but they don't really show that
that's adequate.
I'm a bit disturbed by the fact that you seem to be showing maximum
measured depth for the non-outlier tests as only around 40K-ish.
That doesn't match up very well with my pmap results, since in no
case did I see a physical stack size below 188K.
[ pokes around for a little bit... ] Oh, this is interesting: it looks
like the *postmaster*'s stack size is 188K, and of course every forked
child is going to inherit that as a minimum stack depth. What's more,
pmap shows stack sizes near that for all my running postmasters going back
to 8.4. But 8.3 and before show a stack size of 84K, which seems to be
some sort of minimum on this machine; even a trivial "cat" process has
that stack size according to pmap.
Conclusion: something we did in 8.4 greatly bloated the postmaster's
stack space consumption, to the point that it's significantly more than
anything a normal backend does. That's surprising and scary, because
it means the postmaster is *more* exposed to stack SIGSEGV than most
backends. We need to find the cause, IMO.
regards, tom lane
On Wed, Jul 6, 2016 at 2:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Conclusion: something we did in 8.4 greatly bloated the postmaster's > stack space consumption, to the point that it's significantly more than > anything a normal backend does. That's surprising and scary, because > it means the postmaster is *more* exposed to stack SIGSEGV than most > backends. We need to find the cause, IMO. Hm. I do something based on your test where I build a .so and started the postmaster with -c shared_preload_libraries to load it. I tried to run it on every revision I have built for the historic benchmarks. That only worked as far back as 8.4.0 -- which makes me suspect it's possibly because of precisely shared_preload_libraries and the dynamic linker that the stack size grew.... The only thing it actually revealed was a *drop* of 50kB between REL9_2_0~1610 and REL9_2_0~1396. REL8_4_0~1702 188K REL8_4_0~1603 192K REL8_4_0~1498 188K REL8_4_0~1358 192K REL8_4_0~1218 184K REL8_4_0~1013 188K REL8_4_0~996 192K REL8_4_0~856 192K REL8_4_0~775 192K REL8_4_0~567 192K REL8_4_0~480 188K REL8_4_0~360 188K REL8_4_0~151 188K REL9_0_0~1855 188K REL9_0_0~1654 188K REL9_0_0~1538 192K REL9_0_0~1454 184K REL9_0_0~1351 184K REL9_0_0~1249 188K REL9_0_0~1107 184K REL9_0_0~938 184K REL9_0_0~627 184K REL9_0_0~414 184K REL9_0_0~202 184K REL9_1_0~1867 188K REL9_1_0~1695 184K REL9_1_0~1511 188K REL9_1_0~1328 192K REL9_1_0~978 192K REL9_1_0~948 188K REL9_1_0~628 188K REL9_1_0~382 192K REL9_2_0~1825 184K REL9_2_0~1610 192K <--------------- here REL9_2_0~1396 148K REL9_2_0~1226 148K REL9_2_0~1190 148K REL9_2_0~1072 140K REL9_2_0~1071 144K REL9_2_0~984 144K REL9_2_0~777 144K REL9_2_0~767 148K REL9_2_0~551 148K REL9_2_0~309 144K REL9_3_0~1509 148K REL9_3_0~1304 148K REL9_3_0~1099 144K REL9_3_0~1030 144K REL9_3_0~944 140K REL9_3_0~789 144K REL9_3_0~735 148K REL9_3_0~589 144K REL9_3_0~390 148K REL9_3_0~223 144K REL9_4_0~1923 148K REL9_4_0~1894 148K REL9_4_0~1755 144K REL9_4_0~1688 144K REL9_4_0~1617 144K REL9_4_0~1431 144K REL9_4_0~1246 144K REL9_4_0~1142 148K REL9_4_0~995 148K REL9_4_0~744 140K REL9_4_0~462 148K REL9_5_0~2370 148K REL8_4_22 192K REL9_5_0~2183 148K REL9_5_0~1996 148K REL9_5_0~1782 144K REL9_5_0~1569 148K REL9_5_0~1557 144K REL9_5_ALPHA1-20-g7b156c1 144K REL9_5_ALPHA1-299-g47ebbdc 144K REL9_5_ALPHA1-489-ge06b2e1 144K REL9_0_23 188K REL9_1_19 192K REL9_2_14 144K REL9_3_10 148K REL9_4_5 148K REL9_5_ALPHA1-683-ge073490 144K REL9_5_ALPHA1-844-gdfcd9cb 148K REL9_5_0 148K REL9_5_ALPHA1-972-g7dc09c1 144K REL9_5_ALPHA1-1114-g57a6a72 148K -- greg
Greg Stark <stark@mit.edu> writes:
> Ok, I managed to get __atribute__((destructor)) working and capitured
> the attached pmap output for all the revisions. You can see the git
> revision in the binary name along with a putative date though in the
> case of branches the date can be deceptive. It looks to me like REL8_4
> is already bloated by REL8_4_0~2268 (which means 2268 commits *before*
> the REL8_4_0 tag -- i.e. soon after it branched).
I traced through this by dint of inserting a lot of system("pmap") calls,
and what I found out is that it's the act of setting one of the timezone
variables that does it. This is because tzload() allocates a local
variable "union local_storage ls", which sounds harmless enough, but
in point of fact the darn thing is 78K! And to add insult to injury,
with my setting (US/Eastern) there is a recursive call to parse the
"posixrules" timezone file. So that's 150K worth of stack right
there, although possibly it's only half that for some zone settings.
(And if you use "GMT" you escape all of this, since that's hard coded.)
So now I understand why the IANA code has provisions for malloc'ing
that storage rather than just using the stack. We should do likewise.
regards, tom lane
Ok, I managed to get __atribute__((destructor)) working and capitured the attached pmap output for all the revisions. You can see the git revision in the binary name along with a putative date though in the case of branches the date can be deceptive. It looks to me like REL8_4 is already bloated by REL8_4_0~2268 (which means 2268 commits *before* the REL8_4_0 tag -- i.e. soon after it branched). I can't really make heads or tails of this. I don't see any commits in the early days of 8.4 that could change the stack depth in the postmaster.
Вложения
I found out that pmap can give much more fine-grained results than I was
getting before, if you give it the -x flag and then pay attention to the
"dirty" column rather than the "nominal size" column. That gives a
reliable indication of how much stack space the process ever actually
touched, with resolution apparently 4KB on my machine.
I redid my measurements with commit 62c8421e8 applied, and now get results
like this for one run of the standard regression tests:
$ grep '\[ stack \]' postmaster.log | sort -k 4n | uniq -c 137 00007fff0f615000 84 36 36 rw--- [
stack] 21 00007fff0f615000 84 40 40 rw--- [ stack ] 4 00007fff0f615000 84 44 44
rw--- [ stack ] 20 00007fff0f615000 84 48 48 rw--- [ stack ] 8 00007fff0f615000 84
52 52 rw--- [ stack ] 2 00007fff0f615000 84 56 56 rw--- [ stack ] 10 00007fff0f615000
84 60 60 rw--- [ stack ] 3 00007fff0f615000 84 64 64 rw--- [ stack ] 3
00007fff0f615000 84 68 68 rw--- [ stack ] 2 00007fff0f615000 84 72 72 rw--- [
stack] 1 00007fff0f612000 96 76 76 rw--- [ stack ] 2 00007fff0f60e000 112 112 112
rw--- [ stack ] 1 00007fff0f5e0000 296 296 296 rw--- [ stack ] 1 00007fff0f427000 2060
2060 2060 rw--- [ stack ]
The rightmost numeric column is the "dirty KB in region" column, and 36KB
is the floor established by the postmaster. (It looks like selecting
timezone is still the largest stack-space hog in that, but it's no longer
enough to make me want to do something about it.) So now we're seeing
some cases that exceed that floor, which is good. regex and errors are
still the outliers, as expected.
Also, I found that on OS X "vmmap -dirty" could produce results comparable
to pmap, so here's the numbers for the same test case on current OS X:
154 Stack 8192K 36K 2 5 Stack 8192K 40K
2 11 Stack 8192K 44K 2 6 Stack 8192K 48K
2 11 Stack 8192K 52K 2 7 Stack 8192K
56K 2 8 Stack 8192K 60K 2 2 Stack 8192K
64K 2 2 Stack 8192K 68K 2 4 Stack
8192K 72K 2 1 Stack 8192K 76K 2 2 Stack
8192K 108K 2 1 Stack 8192K 384K 2 1 Stack
8192K 2056K 2
(The "virtual" stack size seems to always be the same as ulimit -s,
ie 8MB by default, on this platform.) This is good confirmation
that the actual stack consumption is pretty stable across different
compilers, though it looks like OS X's version of clang is a bit
more stack-wasteful for the regex recursion.
Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP
to 256KB on x86_64. It would sure be good to check things on some
other architectures, though ...
regards, tom lane
I wrote: > Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP > to 256KB on x86_64. It would sure be good to check things on some > other architectures, though ... I went to the work of doing the same test on a PPC Mac: 182 Stack [ 8192K/ 40K] 25 Stack [ 8192K/ 48K] 2 Stack [ 8192K/ 56K] 11 Stack [ 8192K/ 60K] 5 Stack [ 8192K/ 64K] 2 Stack [ 8192K/ 108K] 1 Stack [ 8192K/ 576K] 1 Stack [ 8192K/ 2056K] The last number here is "resident pages", not "dirty pages", because this older version of OS X doesn't provide the latter. Still, the numbers seem to track pretty well with the ones I got on x86_64. Which is a bit odd when you think about it: a 32-bit machine ought to consume less stack space because pointers are narrower. Also on my old HPPA dinosaur: 40 addr 0x7b03a000, length 8, physical pages 8, type STACK166 addr 0x7b03a000, length 10, physical pages 9, type STACK26 addr 0x7b03a000, length 12, physical pages 11, type STACK 16 addr 0x7b03a000, length 14, physical pages 13, typeSTACK 1 addr 0x7b03a000, length 15, physical pages 13, type STACK 1 addr 0x7b03a000, length 16, physical pages 15,type STACK 2 addr 0x7b03a000, length 28, physical pages 27, type STACK 1 addr 0x7b03a000, length 190, physical pages190, type STACK 1 addr 0x7b03a000, length 514, physical pages 514, type STACK As best I can tell, "length" is the nominal virtual space for the stack, and "physical pages" is the actually allocated/resident space, both measured in 4K pages. So that again matches pretty well, although the stack-efficiency of the recursive regex functions seems to get worse with each new case I look at. However ... the thread here https://www.postgresql.org/message-id/flat/21563.1289064886%40sss.pgh.pa.us says that depending on your choice of compiler and optimization level, IA64 can be 4x to 5x worse for stack space than x86_64, even after spotting it double the memory allocation to handle its two separate stacks. I don't currently have access to an IA64 machine to check. Based on what I'm seeing so far, really 100K ought to be more than plenty of slop for most architectures, but I'm afraid to go there for IA64. Also, there might be some more places like tzload() that are putting unreasonably large variables on the stack, but that the regression tests don't exercise (I've not tested anything replication-related, for example). Bottom line: I propose that we keep STACK_DEPTH_SLOP at 512K for IA64 but reduce it to 256K for everything else. regards, tom lane
On Fri, Jul 8, 2016 at 4:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Based on what I'm seeing so far, really 100K ought to be more than plenty > of slop for most architectures, but I'm afraid to go there for IA64. Searching for info on ia64 turned up this interesting thread: https://www.postgresql.org/message-id/21563.1289064886%40sss.pgh.pa.us From that discussion it seems we should probably run these tests with -O0 because the stack usage can be substantially higher without optimizations. And it doesn't sound like ia64 uses much more *normal* stack, just that there's the additional register stack. It might not be unreasonable to commit the pmap hack, gather the data from the build farm then later add an #ifdef around it. (or just make it #ifdef USE_ASSERTIONS which I assume most build farm members are running with anyways). Alternatively it wouldn't be very hard to use mincore(2) to implement it natively. I believe mincore is nonstandard but present in Linux and BSD. -- greg
Greg Stark <stark@mit.edu> writes: > Searching for info on ia64 turned up this interesting thread: > https://www.postgresql.org/message-id/21563.1289064886%40sss.pgh.pa.us Yeah, that's the same one I referenced upthread ;-) > From that discussion it seems we should probably run these tests with > -O0 because the stack usage can be substantially higher without > optimizations. And it doesn't sound like ia64 uses much more *normal* > stack, just that there's the additional register stack. > It might not be unreasonable to commit the pmap hack, gather the data > from the build farm then later add an #ifdef around it. (or just make > it #ifdef USE_ASSERTIONS which I assume most build farm members are > running with anyways). Hmm. The two IA64 critters in the farm are running HPUX, which means they likely don't have pmap. But I could clean up the hack I used to gather stack size data on gaur's host and commit it temporarily. On non-HPUX platforms we could just try system("pmap -x") and see what happens; as long as we're ignoring the result it shouldn't cause anything really bad. I was going to object that this would probably not tell us anything about the worst-case IA64 stack usage, but I see that neither of those critters are selecting any optimization, so actually it would. So, agreed, let's commit some temporary debug code and see what the buildfarm can teach us. Will go work on that in a bit. > Alternatively it wouldn't be very hard to use mincore(2) to implement > it natively. I believe mincore is nonstandard but present in Linux and > BSD. Hm, after reading the man page I don't quite see how that would help? You'd have to already know the mapped stack address range in order to call the function without getting ENOMEM. regards, tom lane
On Fri, Jul 8, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm, after reading the man page I don't quite see how that would help? > You'd have to already know the mapped stack address range in order to > call the function without getting ENOMEM. I had assumed unmapped pages would just return a 0 in the bitmap. I suppose you could still do it by just probing one page at a time until you find an unmapped page. In a way that's better since you can count stack pages even if they're paged out. Fwiw here's the pmap info from burbot (Linux Sparc64): 136 48 48 rw--- [ stack ] 136 48 48 rw--- [ stack ] 136 48 48 rw--- [ stack ] 136 48 48 rw--- [ stack ] 136 56 56 rw--- [ stack ] 136 80 80 rw--- [ stack ] 136 96 96 rw--- [ stack ] 136 112 112 rw--- [ stack ] 136 112 112 rw--- [ stack ] 576 576 576 rw--- [ stack ] 2056 2056 2056 rw--- [ stack ] I'm actually a bit confused how to interpret these numbers. This appears to be an 8kB pagesize architecture so is that 576*8kB or over 5MB of stack for the regexp test? But we don't know if there are any check_stack_depth calls in that call tree? -- greg
Greg Stark <stark@mit.edu> writes:
> Fwiw here's the pmap info from burbot (Linux Sparc64):
> 136 48 48 rw--- [ stack ]
> 136 48 48 rw--- [ stack ]
> 136 48 48 rw--- [ stack ]
> 136 48 48 rw--- [ stack ]
> 136 56 56 rw--- [ stack ]
> 136 80 80 rw--- [ stack ]
> 136 96 96 rw--- [ stack ]
> 136 112 112 rw--- [ stack ]
> 136 112 112 rw--- [ stack ]
> 576 576 576 rw--- [ stack ]
> 2056 2056 2056 rw--- [ stack ]
> I'm actually a bit confused how to interpret these numbers. This
> appears to be an 8kB pagesize architecture so is that 576*8kB or over
> 5MB of stack for the regexp test?
No, pmap specifies that its outputs are measured in kilobytes. So this
is by and large the same as what I'm seeing on x86_64, again with the
caveat that the recursive regex routines seem to vary all over the place
in terms of stack consumption.
> But we don't know if there are any
> check_stack_depth calls in that call tree?
The regex recursion definitely does have check_stack_depth calls in it
(since commit b63fc2877). But what we're trying to measure here is the
worst-case stack depth regardless of any check_stack_depth calls. That's
a ceiling on what we might need to set STACK_DEPTH_SLOP to --- probably a
very loose ceiling, but I don't want to err on the side of underestimating
it. I wouldn't consider either the regex or errors tests as needing to
bound STACK_DEPTH_SLOP, since we know that most of their consumption is
from recursive code that contains check_stack_depth calls. But it's
useful to look at those depths just as a sanity check that we're getting
valid numbers.
regards, tom lane
I wrote:
> So, agreed, let's commit some temporary debug code and see what the
> buildfarm can teach us. Will go work on that in a bit.
After reviewing the buildfarm results, I'm feeling nervous about this
whole idea again. For the most part, the unaccounted-for daylight between
the maximum stack depth measured by check_stack_depth and the actually
dirtied stack space reported by pmap is under 100K. But there are a
pretty fair number of exceptions. The worst cases I found were on
"dunlin", which approached 200K extra space in a couple of places:
dunlin | 2016-07-09 22:05:09 | check.log | 00007ffff2667000 268 208
208 rw--- [ stack ]dunlin | 2016-07-09 22:05:09 | check.log | max
measuredstack depth 14kBdunlin | 2016-07-09 22:05:09 | install-check-C.log |
00007fffee650000 268 200 200 rw--- [ stack ]dunlin | 2016-07-09 22:05:09 | install-check-C.log
| max measured stack depth 14kB
This appears to be happening in the tsdicts test script. Other machines
also show a significant discrepancy between pmap and check_stack_depth
results for that test, which suggests that maybe the tsearch code is being
overly reliant on large local variables. But I haven't dug through it.
Another area of concern is PLs. For instance, on capybara, a machine
otherwise pretty unexceptional in stack-space appetite, quite a few of the
PL tests ate ~100K of unaccounted-for space:
capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104
104 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log |
00007ffc61bbe000 132 0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log
| max measured stack depth 8kBcapybara | 2016-07-09 21:15:56 | pl-install-check-C.log
| 00007ffc61bbd000 136 136 136 rw--- [ stack ]capybara | 2016-07-09 21:15:56 |
pl-install-check-C.log | 00007ffc61bbd000 136 0 0 rw--- [ stack ]capybara
| 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 0kBcapybara |
2016-07-0921:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw---
[stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132
0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log |
maxmeasured stack depth 5kBcapybara | 2016-07-09 21:15:56 | pl-install-check-C.log |
00007ffc61bbe000 132 116 116 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log
| 00007ffc61bbe000 132 0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56
|pl-install-check-C.log | max measured stack depth 7kB
Presumably that reflects some oddity of the local version of perl or
python, but I have no idea what.
So while we could possibly get away with reducing STACK_DEPTH_SLOP
to 256K, there is good reason to think that that would be leaving
little or no safety margin.
At this point I'm inclined to think we should leave well enough alone.
At the very least, if we were to try to reduce that number, I'd want
to have some plan for tracking our stack space consumption better than
we have done in the past.
regards, tom lane
PS: for amusement's sake, here are some numbers I extracted concerning
the relative stack-hungriness of different buildfarm members. First,
the number of recursion levels each machine could accomplish before
hitting "stack too deep" in the errors.sql regression test (measured by
counting the number of CONTEXT lines in the relevant error message):
sysname | snapshot | count
---------------+---------------------+-------protosciurus | 2016-07-10 12:03:06 | 731chub | 2016-07-10
15:10:01| 1033quokka | 2016-07-10 02:17:31 | 1033hornet | 2016-07-09 23:42:32 | 1156clam |
2016-07-0922:00:01 | 1265anole | 2016-07-09 22:41:40 | 1413spoonbill | 2016-07-09 23:00:05 |
1535sungazer | 2016-07-09 23:51:33 | 1618gaur | 2016-07-09 04:53:13 | 1634kouprey | 2016-07-10
04:58:00| 1653nudibranch | 2016-07-10 09:18:10 | 1664grouse | 2016-07-10 08:43:02 | 1708sprat |
2016-07-1008:43:55 | 1717pademelon | 2016-07-09 06:12:10 | 1814mandrill | 2016-07-10 00:10:02 | 2093gharial
| 2016-07-10 01:15:50 | 2248francolin | 2016-07-10 13:00:01 | 2379piculet | 2016-07-10 13:00:01 |
2379lorikeet | 2016-07-10 08:04:19 | 2422caecilian | 2016-07-09 19:31:50 | 2423jacana | 2016-07-09
22:36:38| 2515bowerbird | 2016-07-10 02:13:47 | 2617locust | 2016-07-09 21:50:26 | 2838prairiedog |
2016-07-0922:44:58 | 2838dromedary | 2016-07-09 20:48:06 | 2840damselfly | 2016-07-10 10:27:09 |
2880curculio | 2016-07-09 21:30:01 | 2905mylodon | 2016-07-09 20:50:01 | 2974tern | 2016-07-09
23:51:23| 3015burbot | 2016-07-10 03:30:45 | 3042magpie | 2016-07-09 21:38:02 | 3043reindeer |
2016-07-1004:00:05 | 3043friarbird | 2016-07-10 04:20:01 | 3187nightjar | 2016-07-09 21:17:52 |
3187sittella | 2016-07-09 21:46:29 | 3188crake | 2016-07-09 22:06:09 | 3267guaibasaurus | 2016-07-10
00:17:01| 3267ibex | 2016-07-09 20:59:06 | 3267mule | 2016-07-09 23:30:02 | 3267spurfowl |
2016-07-0921:06:39 | 3267anchovy | 2016-07-09 21:41:04 | 3268blesbok | 2016-07-09 21:17:46 |
3268capybara | 2016-07-09 21:15:56 | 3268conchuela | 2016-07-09 21:00:01 | 3268handfish | 2016-07-09
04:37:57| 3268macaque | 2016-07-08 21:25:06 | 3268minisauripus | 2016-07-10 03:19:42 | 3268rhinoceros |
2016-07-0921:45:01 | 3268sidewinder | 2016-07-09 21:45:00 | 3272jaguarundi | 2016-07-10 06:52:05 | 3355loach
| 2016-07-09 21:15:00 | 3355okapi | 2016-07-10 06:15:02 | 3425fulmar | 2016-07-09 23:47:57 |
3436longfin | 2016-07-09 21:10:17 | 3444brolga | 2016-07-10 09:40:46 | 3537dunlin | 2016-07-09
22:05:09| 3616coypu | 2016-07-09 22:20:46 | 3626hyrax | 2016-07-09 19:52:03 | 3635treepie |
2016-07-0922:41:37 | 3635frogmouth | 2016-07-10 02:00:09 | 3636narwhal | 2016-07-10 10:00:05 |
3966rover_firefly| 2016-07-10 15:01:45 | 4084lapwing | 2016-07-09 21:15:01 | 4085cockatiel | 2016-07-10
13:40:47| 4362currawong | 2016-07-10 05:16:03 | 5136mastodon | 2016-07-10 11:00:01 | 5136termite |
2016-07-0921:01:30 | 5452hamster | 2016-07-09 16:00:06 | 5685dangomushi | 2016-07-09 18:00:27 | 5692gull
| 2016-07-10 04:48:28 | 5692mereswine | 2016-07-10 10:40:57 | 5810axolotl | 2016-07-09 22:12:12 |
5811chipmunk | 2016-07-10 08:18:07 | 5949grison | 2016-07-09 21:00:02 | 5949
(74 rows)
(coypu gets a gold star for this one, since it makes a good showing
despite having max_stack_depth set to 1536kB --- everyone else seems
to be using 2MB.)
Second, the stack space consumed for the regex regression test --- here,
smaller is better:
currawong | 2016-07-10 05:16:03 | max measured stack depth 213kBmastodon | 2016-07-10 11:00:01 | max measured
stackdepth 213kBaxolotl | 2016-07-09 22:12:12 | max measured stack depth 240kBhamster | 2016-07-09 16:00:06
|max measured stack depth 240kBmereswine | 2016-07-10 10:40:57 | max measured stack depth 240kBbrolga |
2016-07-1009:40:46 | max measured stack depth 284kBnarwhal | 2016-07-10 10:00:05 | max measured stack depth
284kBcockatiel | 2016-07-10 13:40:47 | max measured stack depth 285kBfrancolin | 2016-07-10 13:00:01 | max
measuredstack depth 285kBhyrax | 2016-07-09 19:52:03 | max measured stack depth 285kBmagpie | 2016-07-09
21:38:02| max measured stack depth 285kBpiculet | 2016-07-10 13:00:01 | max measured stack depth 285kBreindeer
| 2016-07-10 04:00:05 | max measured stack depth 285kBtreepie | 2016-07-09 22:41:37 | max measured stack depth
285kBlapwing | 2016-07-09 21:15:01 | max measured stack depth 287kBrover_firefly | 2016-07-10 15:01:45 | max
measuredstack depth 287kBcoypu | 2016-07-09 22:20:46 | max measured stack depth 288kBfriarbird | 2016-07-10
04:20:01| max measured stack depth 289kBnightjar | 2016-07-09 21:17:52 | max measured stack depth 289kBgharial
| 2016-07-10 01:15:50 | max measured stack depths 290kB, 384kBbowerbird | 2016-07-10 02:13:47 | max measured stack
depth378kBcaecilian | 2016-07-09 19:31:50 | max measured stack depth 378kBfrogmouth | 2016-07-10 02:00:09 | max
measuredstack depth 378kBmylodon | 2016-07-09 20:50:01 | max measured stack depth 378kBjaguarundi | 2016-07-10
06:52:05| max measured stack depth 379kBloach | 2016-07-09 21:15:00 | max measured stack depth 379kBlongfin
| 2016-07-09 21:10:17 | max measured stack depth 379kBsidewinder | 2016-07-09 21:45:00 | max measured stack depth
379kBanchovy | 2016-07-09 21:41:04 | max measured stack depth 381kBblesbok | 2016-07-09 21:17:46 | max
measuredstack depth 381kBcapybara | 2016-07-09 21:15:56 | max measured stack depth 381kBconchuela | 2016-07-09
21:00:01| max measured stack depth 381kBcrake | 2016-07-09 22:06:09 | max measured stack depth 381kBcurculio
| 2016-07-09 21:30:01 | max measured stack depth 381kBguaibasaurus | 2016-07-10 00:17:01 | max measured stack depth
381kBhandfish | 2016-07-09 04:37:57 | max measured stack depth 381kBibex | 2016-07-09 20:59:06 | max
measuredstack depth 381kBmacaque | 2016-07-08 21:25:06 | max measured stack depth 381kBminisauripus | 2016-07-10
03:19:42| max measured stack depth 381kBmule | 2016-07-09 23:30:02 | max measured stack depth 381kBrhinoceros
| 2016-07-09 21:45:01 | max measured stack depth 381kBsittella | 2016-07-09 21:46:29 | max measured stack depth
381kBspurfowl | 2016-07-09 21:06:39 | max measured stack depth 381kBdromedary | 2016-07-09 20:48:06 | max
measuredstack depth 382kBpademelon | 2016-07-09 06:12:10 | max measured stack depth 382kBfulmar | 2016-07-09
23:47:57| max measured stack depth 383kBdunlin | 2016-07-09 22:05:09 | max measured stack depth 388kBokapi
| 2016-07-10 06:15:02 | max measured stack depth 389kBmandrill | 2016-07-10 00:10:02 | max measured stack depth
489kBtern | 2016-07-09 23:51:23 | max measured stack depth 491kBdamselfly | 2016-07-10 10:27:09 | max
measuredstack depth 492kBburbot | 2016-07-10 03:30:45 | max measured stack depth 567kBlocust | 2016-07-09
21:50:26| max measured stack depth 571kBprairiedog | 2016-07-09 22:44:58 | max measured stack depth 571kBclam
| 2016-07-09 22:00:01 | max measured stack depth 573kBjacana | 2016-07-09 22:36:38 | max measured stack depth
661kBlorikeet | 2016-07-10 08:04:19 | max measured stack depth 662kBgaur | 2016-07-09 04:53:13 | max
measuredstack depth 756kBchub | 2016-07-10 15:10:01 | max measured stack depth 856kBquokka | 2016-07-10
02:17:31| max measured stack depth 856kBhornet | 2016-07-09 23:42:32 | max measured stack depth 868kBgrouse
| 2016-07-10 08:43:02 | max measured stack depth 944kBkouprey | 2016-07-10 04:58:00 | max measured stack depth
944kBnudibranch | 2016-07-10 09:18:10 | max measured stack depth 945kBsprat | 2016-07-10 08:43:55 | max
measuredstack depth 946kBsungazer | 2016-07-09 23:51:33 | max measured stack depth 963kBprotosciurus | 2016-07-10
12:03:06| max measured stack depth 1432kB
The second list omits a couple of machines whose reports got garbled
by concurrent insertions into the log file.