Обсуждение: Custom pgstat support performance regression for simple queries
Hi, I wanted to run some performance tests for [1] and looked at the profile of a workload with a lot of short queries. And was rather surprised to see pgstat_report_stat() to be the top entry - that certainly didn't use to be the case. For the, obviously rather extreme, workload of a pgbench session just running SELECT 1; I see about a 6% regression from 17. That seems too high, even for such an absurd workload. It's one thing to have a loop over all potential stats kinds when we know there are pending stats, that's not that frequent. But fc415edf8ca8 changed a test consisting out of 3 checks of a variable and one direct function call to a loop over an array with 256 elements, with a number of indirect function calls - all of that happens *before* pgstat_report_stat() decides that there are no stats to report. It seems rather unsurprising that that causes a slowdown. The pre-check is there to: /* Don't expend a clock check if nothing to do */ but you made it way more expensive than a clock check would have been (not counting old vmware installs or such, where clock checks take ages). If I change the loop count to only be the builtin stats kinds, the overhead goes away almost completely. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/aGKSzFlpQWSh%2F%2B2w%40ip-10-97-1-34.eu-west-3.compute.internal
On Tue, Jul 22, 2025 at 10:57:06AM -0400, Andres Freund wrote: > It seems rather unsurprising that that causes a slowdown. > > The pre-check is there to: > /* Don't expend a clock check if nothing to do */ > > but you made it way more expensive than a clock check would have been (not > counting old vmware installs or such, where clock checks take ages). > > If I change the loop count to only be the builtin stats kinds, the overhead > goes away almost completely. Noted. I was wondering originally if the threshold for the builtin and custom kinds should be lowered, as well. Perhaps that's just too optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a strict policy here, but these would make the whole cheaper to begin with, with enough margin for any new stats kinds. Then, about the loop used here, I'd be OK to keep the past shortcuts for the builtin fixed-sized stats kinds with the requirement that new builtin stats kinds need to hack into pgstat_report_stat() themselves on efficiency grounds, combined with a second loop for custom kinds, taken only if pgstat_register_kind() has been used by tracking if that's the case in a flag. Do you have a specific preference here? -- Michael
Вложения
Hi, On Wed, Jul 23, 2025 at 09:54:12AM +0900, Michael Paquier wrote: > Then, about the loop used here, I'd be OK to keep the past shortcuts > for the builtin fixed-sized stats kinds with the requirement that new > builtin stats kinds need to hack into pgstat_report_stat() themselves > on efficiency grounds Maybe we could use a flag, say: #define PGSTAT_PENDING_IO (1 << 0) #define PGSTAT_PENDING_WAL (1 << 1) #define PGSTAT_PENDING_SLRU (1 << 2) and check for a pgstat_pending_mask in pgstat_report_stat() instead? They would need to set pgstat_pending_mask accordingly when they flush, have pending stats though. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Jul 23, 2025, at 16:33, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > Maybe we could use a flag, say: > > #define PGSTAT_PENDING_IO (1 << 0) > #define PGSTAT_PENDING_WAL (1 << 1) > #define PGSTAT_PENDING_SLRU (1 << 2) > > and check for a pgstat_pending_mask in pgstat_report_stat() instead? > > They would need to set pgstat_pending_mask accordingly when they flush, have > pending stats though. The point is just to say to the report path to move further if at least one of the fixed stats kinds has something to flush,then let the loop do its work across all the stats kinds if they have a flush callback, so we don’t really need tomix multiple numbers; we could just have a single boolean flag that any fixed-sized stats kinds can set to let the reportingknow that some activity has happened. This would mean that custom fixed-sized kinds would need to interact withthis flag as well, but that makes the fast-exit path dirty cheap with or without custom stats kinds. -- Michael
On Wed, Jul 23, 2025 at 05:09:54PM +0900, Michael Paquier wrote: > On Jul 23, 2025, at 16:33, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Maybe we could use a flag, say: > > > > #define PGSTAT_PENDING_IO (1 << 0) > > #define PGSTAT_PENDING_WAL (1 << 1) > > #define PGSTAT_PENDING_SLRU (1 << 2) > > > > and check for a pgstat_pending_mask in pgstat_report_stat() instead? > > > > They would need to set pgstat_pending_mask accordingly when they flush, have > > pending stats though. > > The point is just to say to the report path to move further if at least one of > the fixed stats kinds has something to flush, then let the loop do its work > across all the stats kinds if they have a flush callback, Right. > so we don’t really need to mix multiple numbers; we could just have a single > boolean flag that any fixed-sized stats kinds can set to let the reporting know > that some activity has happened. That works to say "there are pending stats" but not well to say "there are no pending stats". Indeed, with a single boolean flag, then how could a stat say that it has nothing pending anymore (when flushing) without saying "all the stats have nothing pending" (while some may still have pending stats)? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 2025-07-23 10:23:53 +0000, Bertrand Drouvot wrote: > On Wed, Jul 23, 2025 at 05:09:54PM +0900, Michael Paquier wrote: > > so we don’t really need to mix multiple numbers; we could just have a single > > boolean flag that any fixed-sized stats kinds can set to let the reporting know > > that some activity has happened. > > That works to say "there are pending stats" but not well to say "there are no > pending stats". > > Indeed, with a single boolean flag, then how could a stat say that it has nothing > pending anymore (when flushing) without saying "all the stats have nothing > pending" (while some may still have pending stats)? I don't think that's a problem - reset that global flag after checking it at the start of pgstat_report_stat() and set it to true if partial_flush is true at the end of pgstat_report_stat(). Greetings, Andres Freund
Hi, On 2025-07-23 09:54:12 +0900, Michael Paquier wrote: > On Tue, Jul 22, 2025 at 10:57:06AM -0400, Andres Freund wrote: > > It seems rather unsurprising that that causes a slowdown. > > > > The pre-check is there to: > > /* Don't expend a clock check if nothing to do */ > > > > but you made it way more expensive than a clock check would have been (not > > counting old vmware installs or such, where clock checks take ages). > > > > If I change the loop count to only be the builtin stats kinds, the overhead > > goes away almost completely. > > Noted. I was wondering originally if the threshold for the builtin > and custom kinds should be lowered, as well. Perhaps that's just too > optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN > to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a > strict policy here, but these would make the whole cheaper to begin > with, with enough margin for any new stats kinds. Yes, 256 seems pointlessly large. > Then, about the loop used here, I'd be OK to keep the past shortcuts > for the builtin fixed-sized stats kinds with the requirement that new > builtin stats kinds need to hack into pgstat_report_stat() themselves > on efficiency grounds, combined with a second loop for custom kinds, > taken only if pgstat_register_kind() has been used by tracking if > that's the case in a flag. Do you have a specific preference here? I think the downstream approach of having a flag that says if there are *any* pending stats is better. Greetings, Andres Freund
On Wed, Jul 23, 2025 at 11:59:11AM -0400, Andres Freund wrote: > On 2025-07-23 10:23:53 +0000, Bertrand Drouvot wrote: >> Indeed, with a single boolean flag, then how could a stat say that it has nothing >> pending anymore (when flushing) without saying "all the stats have nothing >> pending" (while some may still have pending stats)? > > I don't think that's a problem - reset that global flag after checking it at > the start of pgstat_report_stat() and set it to true if partial_flush is true > at the end of pgstat_report_stat(). That's one way of doing it. While looking at the code, I tend to think that it is actually simpler to reset it once at the bottom of pgstat_report_fixed, not touching it at all if we are in a partial_flush state because we know that we will need to do one report later on anyway, be it just for the stats in the dshash. If we do that, we can also skip entirely the flush_static_cb calls if pgstat_report_fixed is not set while doing the reports. -- Michael
Вложения
On Wed, Jul 23, 2025 at 12:00:55PM -0400, Andres Freund wrote: > On 2025-07-23 09:54:12 +0900, Michael Paquier wrote: >> Noted. I was wondering originally if the threshold for the builtin >> and custom kinds should be lowered, as well. Perhaps that's just too >> optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN >> to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a >> strict policy here, but these would make the whole cheaper to begin >> with, with enough margin for any new stats kinds. > > Yes, 256 seems pointlessly large. I still cannot put my finger on what a good number should be, but I've settled to something that I think should be good enough for an initial release: PGSTAT_KIND_CUSTOM_MIN 128 -> 24 PGSTAT_KIND_MAX 256 -> 32 These numbers mean that we have enough room for 7 more builtins kinds, same for custom kinds. Even if this is an issue, this could always be moved up at some point even across major releases, even if that would mean for extension developers to switch to a different ID. >> Then, about the loop used here, I'd be OK to keep the past shortcuts >> for the builtin fixed-sized stats kinds with the requirement that new >> builtin stats kinds need to hack into pgstat_report_stat() themselves >> on efficiency grounds, combined with a second loop for custom kinds, >> taken only if pgstat_register_kind() has been used by tracking if >> that's the case in a flag. Do you have a specific preference here? > > I think the downstream approach of having a flag that says if there are *any* > pending stats is better. Works for me. I am finishing with the attached, meaning that workloads that trigger no stats can take the fast-exit path with a single boolean check. What do you think? -- Michael
Вложения
Hi, On Thu, Jul 24, 2025 at 09:34:45AM +0900, Michael Paquier wrote: > On Wed, Jul 23, 2025 at 12:00:55PM -0400, Andres Freund wrote: > > On 2025-07-23 09:54:12 +0900, Michael Paquier wrote: > >> Noted. I was wondering originally if the threshold for the builtin > >> and custom kinds should be lowered, as well. Perhaps that's just too > >> optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN > >> to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a > >> strict policy here, but these would make the whole cheaper to begin > >> with, with enough margin for any new stats kinds. > > > > Yes, 256 seems pointlessly large. > > I still cannot put my finger on what a good number should be, but I've > settled to something that I think should be good enough for an initial > release: > PGSTAT_KIND_CUSTOM_MIN 128 -> 24 > PGSTAT_KIND_MAX 256 -> 32 > > These numbers mean that we have enough room for 7 more builtins kinds, 11 for builtins kinds? (from 13 to 23) > same for custom kinds. 9 for custom kinds including experimental? (24 -> 32) I think that gives some room (like we'll almost double the builtins kinds if we were to use them all). > Even if this is an issue, this could always be > moved up at some point even across major releases, even if that would > mean for extension developers to switch to a different ID. Yeah... * development and have not reserved their own unique kind ID yet. See: * https://wiki.postgresql.org/wiki/CustomCumulativeStats */ -#define PGSTAT_KIND_EXPERIMENTAL 128 +#define PGSTAT_KIND_EXPERIMENTAL 24 Note that we'll have to update the wiki too. > >> Then, about the loop used here, I'd be OK to keep the past shortcuts > >> for the builtin fixed-sized stats kinds with the requirement that new > >> builtin stats kinds need to hack into pgstat_report_stat() themselves > >> on efficiency grounds, combined with a second loop for custom kinds, > >> taken only if pgstat_register_kind() has been used by tracking if > >> that's the case in a flag. Do you have a specific preference here? > > > > I think the downstream approach of having a flag that says if there are *any* > > pending stats is better. > > Works for me. I am finishing with the attached, meaning that > workloads that trigger no stats can take the fast-exit path with a > single boolean check. What do you think? That would work but I'm a bit worried about: - it's easy to miss and error prone to update pgstat_report_fixed (for new code path, new builtin fixed-sized stats kinds) - it's easy to miss for extensions that they have to update this global variable. Indeed, the cb have_static_pending_cb has been removed (there is the new comment in PgStat_KindInfo though, but I've the feeling it's easier to miss as compared to the cb) What about? - create a global variable but that should be used only by extensions? Say, pgstat_report_custom_fixed - for builtin fixed-sized, just rely on have_iostats, have_slrustats and friends Then in pgstat_report_stat(): - do the check on have_iostats, have_slrustats and friends and on pgstat_report_custom_fixed - reset pgstat_report_custom_fixed That way I think it's less error prone for the builtin stats and more clear for the extensions (as at least "custom" is also part of the global variable name and the check in pgstat_report_stat() would make it clear that we rely on the custom global variable). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jul 24, 2025 at 07:38:46AM +0000, Bertrand Drouvot wrote: > On Thu, Jul 24, 2025 at 09:34:45AM +0900, Michael Paquier wrote: >> These numbers mean that we have enough room for 7 more builtins kinds, > > 11 for builtins kinds? (from 13 to 23) > > 9 for custom kinds including experimental? (24 -> 32) > > I think that gives some room (like we'll almost double the builtins kinds if we > were to use them all). Yes. Problems with a lack of fingers and caffeine, perhaps. > * development and have not reserved their own unique kind ID yet. See: > * https://wiki.postgresql.org/wiki/CustomCumulativeStats > */ > -#define PGSTAT_KIND_EXPERIMENTAL 128 > +#define PGSTAT_KIND_EXPERIMENTAL 24 > > Note that we'll have to update the wiki too. Yep, that was on my TODO list. > What about? > > - create a global variable but that should be used only by extensions? Say, > pgstat_report_custom_fixed > > - for builtin fixed-sized, just rely on have_iostats, have_slrustats and friends > > Then in pgstat_report_stat(): > > - do the check on have_iostats, have_slrustats and friends and on > pgstat_report_custom_fixed > > - reset pgstat_report_custom_fixed > > That way I think it's less error prone for the builtin stats and more clear > for the extensions (as at least "custom" is also part of the global variable > name and the check in pgstat_report_stat() would make it clear that we rely on > the custom global variable). > > Thoughts? FWIW, I see more benefits in the simplicity of a single flag to govern both builtin and custom kinds, but I can see that this may come down to personal taste. A single flag is simpler here, and cheap. What have_static_pending_cb was originally aiming for is the removal of any knowledge specific to stats kinds in the central pgstats APIs, which is why the flags specific to IO, SLRU and WAL have been moved out into their own files. Letting custom stats manipulate pgstat_report_fixed or invent a new pgstat_report_custom_fixed would be logically the same thing, but this comes down to the fact that we still want to decide if a report should be triggered if any of the fixed-numbered stats want to let pgstat_report_stat() do one round of pending stats flush. Having a second flag would keep this abstraction level intact. Now that leads to overcomplicating the API for a small gain in debuggability to know which part of the subsystem wants the report to happen at early stages of pgstat_report_stat(). If we want to know exactly which stats kind wants the flush to happen, it may be better to reuse your idea of one single uint32 or uint64 with one bit allocated for each stats kind to have this information available in pgstat_report_fixed(). However, we already know that with the flush_static_cb callback, as dedicated paths can be taken for each stats kinds. Once again, this would require stats kind to set their bit to force a round of reports, gaining more information before we try to call each flush_static_cb. I am going to apply the patch to lower the bounds in a bit, to begin with, then adjust the wiki. -- Michael
Вложения
Hi, On Fri, Jul 25, 2025 at 10:38:59AM +0900, Michael Paquier wrote: > On Thu, Jul 24, 2025 at 07:38:46AM +0000, Bertrand Drouvot wrote: > > What about? > > > > - create a global variable but that should be used only by extensions? Say, > > pgstat_report_custom_fixed > > > > - for builtin fixed-sized, just rely on have_iostats, have_slrustats and friends > > > > Then in pgstat_report_stat(): > > > > - do the check on have_iostats, have_slrustats and friends and on > > pgstat_report_custom_fixed > > > > - reset pgstat_report_custom_fixed > > > > That way I think it's less error prone for the builtin stats and more clear > > for the extensions (as at least "custom" is also part of the global variable > > name and the check in pgstat_report_stat() would make it clear that we rely on > > the custom global variable). > > > > Thoughts? > > FWIW, I see more benefits in the simplicity of a single flag to govern > both builtin and custom kinds, but I can see that this may come down > to personal taste. A single flag is simpler here, and cheap. > > What have_static_pending_cb was originally aiming for is the removal > of any knowledge specific to stats kinds in the central pgstats APIs, > which is why the flags specific to IO, SLRU and WAL have been moved > out into their own files. Yes, with my idea we'd need to move them back. > Letting custom stats manipulate pgstat_report_fixed or invent a new > pgstat_report_custom_fixed would be logically the same thing, but > this comes down to the fact that we still want to decide if a report > should be triggered if any of the fixed-numbered stats want to let > pgstat_report_stat() do one round of pending stats flush. Yes. > Having a second flag would keep this abstraction level intact. Not a second, just one global "pgstat_report_custom_fixed" and the specifics flags for IO, SLRU, something like: if (dlist_is_empty(&pgStatPending) && !have_iostats && !have_slrustats && !pgstat_report_custom_fixed && !pgstat_have_pending_wal()) > Now > that leads to overcomplicating the API Not sure. > for a small gain in > debuggability to know which part of the subsystem wants the report to > happen at early stages of pgstat_report_stat(). If we want to know > exactly which stats kind wants the flush to happen, it may be better > to reuse your idea of one single uint32 or uint64 with one bit > allocated for each stats kind to have this information available in > pgstat_report_fixed(). However, we already know that with the > flush_static_cb callback, as dedicated paths can be taken for each > stats kinds. Once again, this would require stats kind to set their > bit to force a round of reports, gaining more information before we > try to call each flush_static_cb. Yeah. I'm fine with one single flag as you proposed, I just have the feeling it's more error prone. As an example: @@ -1091,6 +1092,9 @@ XLogInsertRecord(XLogRecData *rdata, pgWalUsage.wal_bytes += rechdr->xl_tot_len; pgWalUsage.wal_records++; pgWalUsage.wal_fpi += num_fpi; + + /* Required for the flush of pending stats WAL data */ + pgstat_report_fixed = true; } return EndPos; @@ -2108,6 +2112,12 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) LWLockRelease(WALWriteLock); pgWalUsage.wal_buffers_full++; TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE(); + + /* + * Required for the flush of pending stats WAL data, per + * update of pgWalUsage. + */ + pgstat_report_fixed = true; } This was not needed before fc415edf8ca8, and I think it was better to use pgstat_have_pending_wal() to not have to set a flag to every places pgWalUsage.XX changes. Having said that, I think the single global flag patch is pretty straightforward and LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Jul 25, 2025 at 08:57:29AM +0000, Bertrand Drouvot wrote: > This was not needed before fc415edf8ca8, and I think it was better to use > pgstat_have_pending_wal() to not have to set a flag to every places pgWalUsage.XX > changes. At the end, this cost makes the separation of the code layers a bit better, by having pgstat.c not know about the specifics around the fixed-numbered stats, so.. > Having said that, I think the single global flag patch is pretty straightforward > and LGTM. I have used that and applied it down to v18, closing the open item. -- Michael
Вложения
On 2025-07-28 08:18:01 +0900, Michael Paquier wrote: > I have used that and applied it down to v18, closing the open item. Thanks!