Обсуждение: Re: Some ExecSeqScan optimizations
Hi, On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Here's v5 with a few commit message tweaks. > > > > Barring objections, I would like to push this early next week. > > Pushed yesterday. Thank you all. While looking at a profile I recently noticed that ExecSeqScanWithQual() had a runtime branch to test whether qual is NULL, which seemed a bit silly. I think we should use pg_assume(), which I just added to avoid a compiler warning, to improve the code generation here. The performance gain unsurprisingly isn't significant (but seems repeatably measureable), but it does cut out a fair bit of unnecessary code. andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o text data bss dec hex filename 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o 3834 0 0 3834 efa executor_nodeSeqscan.c.o A 13% reduction in actual code size isn't bad for such a small change, imo. I have a separate question as well - do we need to call ResetExprContext() if we neither qual, projection nor epq? I see a small gain by avoiding that. Greetings, Andres Freund
Вложения
Hi Andres, On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > Here's v5 with a few commit message tweaks. > > > > > > Barring objections, I would like to push this early next week. > > > > Pushed yesterday. Thank you all. > > While looking at a profile I recently noticed that ExecSeqScanWithQual() had a > runtime branch to test whether qual is NULL, which seemed a bit silly. I think > we should use pg_assume(), which I just added to avoid a compiler warning, to > improve the code generation here. +1. I think this might be what David was getting at in his first message in this thread. > The performance gain unsurprisingly isn't significant (but seems repeatably > measureable), but it does cut out a fair bit of unnecessary code. > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > text data bss dec hex filename > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > A 13% reduction in actual code size isn't bad for such a small change, imo. Yeah, that seems worthwhile. I had been a bit concerned about code size growth from having four variant functions with at least some duplication, so this is a nice offset. Thanks for the patch. + /* + * Use pg_assume() for != NULL tests to make the compiler realize no + * runtime check for the field is needed in ExecScanExtended(). + */ I propose changing "to make the compiler realize no runtime check" to "so the compiler can optimize away the runtime check", assuming that is what it means. Also, I assume you intentionally avoided repeating the comment in all the variant functions. > I have a separate question as well - do we need to call ResetExprContext() if > we neither qual, projection nor epq? I see a small gain by avoiding that. You're referring to this block, I assume: /* * If we have neither a qual to check nor a projection to do, just skip * all the overhead and return the raw scan tuple. */ if (!qual && !projInfo) { ResetExprContext(econtext); return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); } Yeah, I think it's fine to remove ResetExprContext() here. When I looked at it before, I left it in because I was unsure whether accessMtd() might leak memory into the per-tuple context. But on second thought, that seems unlikely? Would you like me to do it or do you have a patch in your tree already? -- Thanks, Amit Langote
Hi, On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > > On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > > > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Here's v5 with a few commit message tweaks. > > > > > > > > Barring objections, I would like to push this early next week. > > > > > > Pushed yesterday. Thank you all. > > > > While looking at a profile I recently noticed that ExecSeqScanWithQual() had a > > runtime branch to test whether qual is NULL, which seemed a bit silly. I think > > we should use pg_assume(), which I just added to avoid a compiler warning, to > > improve the code generation here. > > +1. I think this might be what David was getting at in his first > message in this thread. Indeed. > > The performance gain unsurprisingly isn't significant (but seems repeatably > > measureable), but it does cut out a fair bit of unnecessary code. > > > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > > text data bss dec hex filename > > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > > > A 13% reduction in actual code size isn't bad for such a small change, imo. > > Yeah, that seems worthwhile. I had been a bit concerned about code > size growth from having four variant functions with at least some > duplication, so this is a nice offset. I'm rather surprised by just how much the size reduces... I built nodeSeqscan.c with -ffunction-sections and looked at the size with size --format=sysv: Before: .text.SeqRecheck 6 0 .rodata.str1.8 135 0 .text.unlikely.SeqNext 53 0 .text.SeqNext 178 0 .text.ExecSeqScanEPQ 20 0 .text.ExecSeqScanWithProject 289 0 .text.unlikely.ExecSeqScanWithQual 53 0 .text.ExecSeqScanWithQual 441 0 .text.unlikely.ExecSeqScanWithQualProject 53 0 .text.ExecSeqScanWithQualProject 811 0 .text.unlikely.ExecSeqScan 53 0 .text.ExecSeqScan 245 0 .text.ExecInitSeqScan 287 0 .text.ExecEndSeqScan 33 0 .text.ExecReScanSeqScan 63 0 .text.ExecSeqScanEstimate 88 0 .text.ExecSeqScanInitializeDSM 114 0 .text.ExecSeqScanReInitializeDSM 34 0 .text.ExecSeqScanInitializeWorker 64 0 After: .text.SeqRecheck 6 0 .rodata.str1.8 135 0 .text.unlikely.SeqNext 53 0 .text.SeqNext 178 0 .text.ExecSeqScanEPQ 20 0 .text.ExecSeqScanWithProject 209 0 .text.unlikely.ExecSeqScanWithQual 53 0 .text.ExecSeqScanWithQual 373 0 .text.unlikely.ExecSeqScanWithQualProject 53 0 .text.ExecSeqScanWithQualProject 474 0 .text.unlikely.ExecSeqScan 53 0 .text.ExecSeqScan 245 0 .text.ExecInitSeqScan 287 0 .text.ExecEndSeqScan 33 0 .text.ExecReScanSeqScan 63 0 .text.ExecSeqScanEstimate 88 0 .text.ExecSeqScanInitializeDSM 114 0 .text.ExecSeqScanReInitializeDSM 34 0 .text.ExecSeqScanInitializeWorker 64 0 I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811 to 474, just due to those null checks being removed... But I'll take it. > Thanks for the patch. > > + /* > + * Use pg_assume() for != NULL tests to make the compiler realize no > + * runtime check for the field is needed in ExecScanExtended(). > + */ > > I propose changing "to make the compiler realize no runtime check" to > "so the compiler can optimize away the runtime check", assuming that > is what it means. It does. I don't really see a meaningful difference between the comments? > Also, I assume you intentionally avoided repeating the comment in all > the variant functions. Yea, it didn't seem helpful to do so. > > I have a separate question as well - do we need to call ResetExprContext() if > > we neither qual, projection nor epq? I see a small gain by avoiding that. > > You're referring to this block, I assume: > > /* > * If we have neither a qual to check nor a projection to do, just skip > * all the overhead and return the raw scan tuple. > */ > if (!qual && !projInfo) > { > ResetExprContext(econtext); > return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); > } Yep. > Yeah, I think it's fine to remove ResetExprContext() here. When I > looked at it before, I left it in because I was unsure whether > accessMtd() might leak memory into the per-tuple context. It's a good question. I think I unfortunately found a problematic case, ForeignNext(). I wonder if we instead can MemoryContextReset cheaper, by avoiding a function call for the common case that no reset is needed. Right now we can't just check ->isReset in an inline function, because we also delete children. I wonder if we could define isReset so that creating a child context unsets isReset? Greetings, Andres Freund
On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote: > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > > > The performance gain unsurprisingly isn't significant (but seems repeatably > > > measureable), but it does cut out a fair bit of unnecessary code. > > > > > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > > > text data bss dec hex filename > > > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > > > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > > > > > A 13% reduction in actual code size isn't bad for such a small change, imo. > > > > Yeah, that seems worthwhile. I had been a bit concerned about code > > size growth from having four variant functions with at least some > > duplication, so this is a nice offset. > > I'm rather surprised by just how much the size reduces... > > I built nodeSeqscan.c with -ffunction-sections and looked at the size with > size --format=sysv: > > Before: > .text.SeqRecheck 6 0 > .rodata.str1.8 135 0 > .text.unlikely.SeqNext 53 0 > .text.SeqNext 178 0 > .text.ExecSeqScanEPQ 20 0 > .text.ExecSeqScanWithProject 289 0 > .text.unlikely.ExecSeqScanWithQual 53 0 > .text.ExecSeqScanWithQual 441 0 > .text.unlikely.ExecSeqScanWithQualProject 53 0 > .text.ExecSeqScanWithQualProject 811 0 > .text.unlikely.ExecSeqScan 53 0 > .text.ExecSeqScan 245 0 > .text.ExecInitSeqScan 287 0 > .text.ExecEndSeqScan 33 0 > .text.ExecReScanSeqScan 63 0 > .text.ExecSeqScanEstimate 88 0 > .text.ExecSeqScanInitializeDSM 114 0 > .text.ExecSeqScanReInitializeDSM 34 0 > .text.ExecSeqScanInitializeWorker 64 0 > > After: > .text.SeqRecheck 6 0 > .rodata.str1.8 135 0 > .text.unlikely.SeqNext 53 0 > .text.SeqNext 178 0 > .text.ExecSeqScanEPQ 20 0 > .text.ExecSeqScanWithProject 209 0 > .text.unlikely.ExecSeqScanWithQual 53 0 > .text.ExecSeqScanWithQual 373 0 > .text.unlikely.ExecSeqScanWithQualProject 53 0 > .text.ExecSeqScanWithQualProject 474 0 > .text.unlikely.ExecSeqScan 53 0 > .text.ExecSeqScan 245 0 > .text.ExecInitSeqScan 287 0 > .text.ExecEndSeqScan 33 0 > .text.ExecReScanSeqScan 63 0 > .text.ExecSeqScanEstimate 88 0 > .text.ExecSeqScanInitializeDSM 114 0 > .text.ExecSeqScanReInitializeDSM 34 0 > .text.ExecSeqScanInitializeWorker 64 0 > > > I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811 > to 474, just due to those null checks being removed... But I'll take it. Wow, indeed. > > Thanks for the patch. > > > > + /* > > + * Use pg_assume() for != NULL tests to make the compiler realize no > > + * runtime check for the field is needed in ExecScanExtended(). > > + */ > > > > I propose changing "to make the compiler realize no runtime check" to > > "so the compiler can optimize away the runtime check", assuming that > > is what it means. > > It does. I don't really see a meaningful difference between the comments? Maybe not. I just had to pause for a moment to be sure that was what it actually meant when I first read it. I'm fine leaving it as is if you prefer. > > > I have a separate question as well - do we need to call ResetExprContext() if > > > we neither qual, projection nor epq? I see a small gain by avoiding that. > > > > You're referring to this block, I assume: > > > > /* > > * If we have neither a qual to check nor a projection to do, just skip > > * all the overhead and return the raw scan tuple. > > */ > > if (!qual && !projInfo) > > { > > ResetExprContext(econtext); > > return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); > > } > > Yep. > > > > Yeah, I think it's fine to remove ResetExprContext() here. When I > > looked at it before, I left it in because I was unsure whether > > accessMtd() might leak memory into the per-tuple context. > > It's a good question. I think I unfortunately found a problematic case, > ForeignNext(). Ah, so we do have a culprit in the tree. > I wonder if we instead can MemoryContextReset cheaper, by avoiding a function > call for the common case that no reset is needed. Right now we can't just > check ->isReset in an inline function, because we also delete children. I > wonder if we could define isReset so that creating a child context unsets > isReset? Were you thinking ResetExprContext() could become something like: #define ResetExprContext(econtext) \ do { \ if (!((econtext)->ecxt_per_tuple_memory)->isReset) \ MemoryContextReset((econtext)->ecxt_per_tuple_memory); \ } while (0) that is, once isReset also accounts for whether any child context exists? -- Thanks, Amit Langote
Hi Amit!
It's a pity I missed this thread when you developed the patch.
I've developed a feature recently and discovered that SeqScan
does not make use of scan keys, and there is a Tom Lane's
comment regarding this:
* Note that unlike IndexScan, SeqScan never use keys in heap_beginscan
* (and this is very bad) - so, here we do not check are keys ok or not.
* (and this is very bad) - so, here we do not check are keys ok or not.
Have you considered passing scan keys like it is done in IndexScan?
Thanks!
Hi, On 2025-07-11 11:22:36 +0900, Amit Langote wrote: > On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote: > > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > > Thanks for the patch. > > > > > > + /* > > > + * Use pg_assume() for != NULL tests to make the compiler realize no > > > + * runtime check for the field is needed in ExecScanExtended(). > > > + */ > > > > > > I propose changing "to make the compiler realize no runtime check" to > > > "so the compiler can optimize away the runtime check", assuming that > > > is what it means. > > > > It does. I don't really see a meaningful difference between the comments? > > Maybe not. I just had to pause for a moment to be sure that was what > it actually meant when I first read it. I'm fine leaving it as is if > you prefer. To me my version makes a bit more sense, by explaining that we tell the compiler information that it otherwise doesn't have, which results in the optimization... > > > > I have a separate question as well - do we need to call ResetExprContext() if > > > > we neither qual, projection nor epq? I see a small gain by avoiding that. > > > > > > You're referring to this block, I assume: > > > > > > /* > > > * If we have neither a qual to check nor a projection to do, just skip > > > * all the overhead and return the raw scan tuple. > > > */ > > > if (!qual && !projInfo) > > > { > > > ResetExprContext(econtext); > > > return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); > > > } > > > > Yep. > > > > > > > Yeah, I think it's fine to remove ResetExprContext() here. When I > > > looked at it before, I left it in because I was unsure whether > > > accessMtd() might leak memory into the per-tuple context. > > > > It's a good question. I think I unfortunately found a problematic case, > > ForeignNext(). > > Ah, so we do have a culprit in the tree. > > > I wonder if we instead can MemoryContextReset cheaper, by avoiding a function > > call for the common case that no reset is needed. Right now we can't just > > check ->isReset in an inline function, because we also delete children. I > > wonder if we could define isReset so that creating a child context unsets > > isReset? > > Were you thinking ResetExprContext() could become something like: > > #define ResetExprContext(econtext) \ > do { \ > if (!((econtext)->ecxt_per_tuple_memory)->isReset) \ > MemoryContextReset((econtext)->ecxt_per_tuple_memory); \ > } while (0) > > that is, once isReset also accounts for whether any child context exists? Nearly - I was thinking we'd do that in MemoryContextReset(), rather than ResetExprContext(). Greetings, Andres Freund
Hi, On 2025-07-11 14:03:42 +0300, Nikita Malakhov wrote: > It's a pity I missed this thread when you developed the patch. > I've developed a feature recently and discovered that SeqScan > does not make use of scan keys, and there is a Tom Lane's > comment regarding this: > * Note that unlike IndexScan, SeqScan never use keys in heap_beginscan > * (and this is very bad) - so, here we do not check are keys ok or not. > > Have you considered passing scan keys like it is done in IndexScan? You can't easily do that without causing issues: 1) ScanKeys are evaluated while holding a buffer lock, we shouldn't do that with arbitrary functions (since they could recurse and acquire other locks in a non-correct order) 2) ScanKeys are rather restrictive in what they can express, but not restrictive enough to make 1) not a problem. That means that you can't just evaluate the whole predicate using ScanKeys. 3) ScanKey evaluation is actually sometimes *more* expensive than expression evaluation, because the columns are deformed one-by-one. Greetings, Andres Freund
Hi!
Andres, thank you for the explanation about the locks.
I've already tried to pass scan keys and saw that it is
quite expensive.
Hi, On Fri, Jul 11, 2025 at 11:34 PM Andres Freund <andres@anarazel.de> wrote: > On 2025-07-11 11:22:36 +0900, Amit Langote wrote: > > On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > > > Thanks for the patch. > > > > > > > > + /* > > > > + * Use pg_assume() for != NULL tests to make the compiler realize no > > > > + * runtime check for the field is needed in ExecScanExtended(). > > > > + */ > > > > > > > > I propose changing "to make the compiler realize no runtime check" to > > > > "so the compiler can optimize away the runtime check", assuming that > > > > is what it means. > > > > > > It does. I don't really see a meaningful difference between the comments? > > > > Maybe not. I just had to pause for a moment to be sure that was what > > it actually meant when I first read it. I'm fine leaving it as is if > > you prefer. > > To me my version makes a bit more sense, by explaining that we tell the > compiler information that it otherwise doesn't have, which results in the > optimization... Ok, that does make sense. > > > > > I have a separate question as well - do we need to call ResetExprContext() if > > > > > we neither qual, projection nor epq? I see a small gain by avoiding that. > > > > > > > > You're referring to this block, I assume: > > > > > > > > /* > > > > * If we have neither a qual to check nor a projection to do, just skip > > > > * all the overhead and return the raw scan tuple. > > > > */ > > > > if (!qual && !projInfo) > > > > { > > > > ResetExprContext(econtext); > > > > return ExecScanFetch(node, epqstate, accessMtd, recheckMtd); > > > > } > > > > > > Yep. > > > > > > > > > > Yeah, I think it's fine to remove ResetExprContext() here. When I > > > > looked at it before, I left it in because I was unsure whether > > > > accessMtd() might leak memory into the per-tuple context. > > > > > > I wonder if we instead can MemoryContextReset cheaper, by avoiding a function > > > call for the common case that no reset is needed. Right now we can't just > > > check ->isReset in an inline function, because we also delete children. I > > > wonder if we could define isReset so that creating a child context unsets > > > isReset? > > > > Were you thinking ResetExprContext() could become something like: > > > > #define ResetExprContext(econtext) \ > > do { \ > > if (!((econtext)->ecxt_per_tuple_memory)->isReset) \ > > MemoryContextReset((econtext)->ecxt_per_tuple_memory); \ > > } while (0) > > > > that is, once isReset also accounts for whether any child context exists? > > Nearly - I was thinking we'd do that in MemoryContextReset(), rather than > ResetExprContext(). Ah, ok -- I was confused about which function you meant ("can't just check ->isReset in an inline function" should have been a clue). I thought you were referring to avoiding the call to MemoryContextReset() itself from ExecScanExtended() by checking isReset. But it sounds like you meant optimizing within MemoryContextReset() -- specifically, skipping MemoryContextDeleteChildren() when isReset is already true, so it becomes: if (context->isReset) return; MemoryContextDeleteChildren(context); MemoryContextResetOnly(context); Just out of curiosity, I tried making that change locally, and meson test (check-world) passed. I assume that's just because nothing notices leaked child contexts -- there's no mechanism asserting that everything under a context gets reset if we skip MemoryContextDeleteChildren(). That’s not to say we don't need MemoryContextCreate() to clear isReset in the parent when adding a child. :-) -- Thanks, Amit Langote
Hi, On 2025-07-11 11:22:36 +0900, Amit Langote wrote: > On Fri, Jul 11, 2025 at 5:55 AM Andres Freund <andres@anarazel.de> wrote: > > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > > > > The performance gain unsurprisingly isn't significant (but seems repeatably > > > > measureable), but it does cut out a fair bit of unnecessary code. > > > > > > > > andres@awork3:/srv/dev/build/postgres/m-dev-optimize$ size executor_nodeSeqscan.c.*o > > > > text data bss dec hex filename > > > > 3330 0 0 3330 d02 executor_nodeSeqscan.c.assume.o > > > > 3834 0 0 3834 efa executor_nodeSeqscan.c.o > > > > > > > > A 13% reduction in actual code size isn't bad for such a small change, imo. > > > > > > Yeah, that seems worthwhile. I had been a bit concerned about code > > > size growth from having four variant functions with at least some > > > duplication, so this is a nice offset. > > > > I'm rather surprised by just how much the size reduces... > > > > I built nodeSeqscan.c with -ffunction-sections and looked at the size with > > size --format=sysv: > > > > Before: > > .text.SeqRecheck 6 0 > > .rodata.str1.8 135 0 > > .text.unlikely.SeqNext 53 0 > > .text.SeqNext 178 0 > > .text.ExecSeqScanEPQ 20 0 > > .text.ExecSeqScanWithProject 289 0 > > .text.unlikely.ExecSeqScanWithQual 53 0 > > .text.ExecSeqScanWithQual 441 0 > > .text.unlikely.ExecSeqScanWithQualProject 53 0 > > .text.ExecSeqScanWithQualProject 811 0 > > .text.unlikely.ExecSeqScan 53 0 > > .text.ExecSeqScan 245 0 > > .text.ExecInitSeqScan 287 0 > > .text.ExecEndSeqScan 33 0 > > .text.ExecReScanSeqScan 63 0 > > .text.ExecSeqScanEstimate 88 0 > > .text.ExecSeqScanInitializeDSM 114 0 > > .text.ExecSeqScanReInitializeDSM 34 0 > > .text.ExecSeqScanInitializeWorker 64 0 > > > > After: > > .text.SeqRecheck 6 0 > > .rodata.str1.8 135 0 > > .text.unlikely.SeqNext 53 0 > > .text.SeqNext 178 0 > > .text.ExecSeqScanEPQ 20 0 > > .text.ExecSeqScanWithProject 209 0 > > .text.unlikely.ExecSeqScanWithQual 53 0 > > .text.ExecSeqScanWithQual 373 0 > > .text.unlikely.ExecSeqScanWithQualProject 53 0 > > .text.ExecSeqScanWithQualProject 474 0 > > .text.unlikely.ExecSeqScan 53 0 > > .text.ExecSeqScan 245 0 > > .text.ExecInitSeqScan 287 0 > > .text.ExecEndSeqScan 33 0 > > .text.ExecReScanSeqScan 63 0 > > .text.ExecSeqScanEstimate 88 0 > > .text.ExecSeqScanInitializeDSM 114 0 > > .text.ExecSeqScanReInitializeDSM 34 0 > > .text.ExecSeqScanInitializeWorker 64 0 > > > > > > I'm rather baffled that the size of ExecSeqScanWithQualProject goes from 811 > > to 474, just due to those null checks being removed... But I'll take it. > > Wow, indeed. Thanks for reviewing. Pushed! Greetings, Andres Freund