Обсуждение: Add support for entry counting in pgstats

Поиск
Список
Период
Сортировка

Add support for entry counting in pgstats

От
Michael Paquier
Дата:
Hi all,

One obstacle to the move of pg_stat_statements to the shmem pgstats
that we have in core is that there is no easy way to track the total
number of entries that are stored in the shared hash table of pgstats
for variable-sized stats kinds.

PGSS currently hash_get_num_entries() as a cheap way to get the number
of entries stored in its hash table, but moving PGSS to the in-core
stats facility means that we need a different approach to count these
entries.

I have looked at what can be done, and finished with a rather simple
approach, as we have only one code path when a new entry is inserted,
and one code path when an entry is removed when the refcount of an
entry reaches 0 (includes resets, drops for kind matches, etc.).  The
counters are stored in a uint64 atomic, one for each stats kind in
PgStat_ShmemControl, set only if the option is enabled.

I am wondering how much protection we want for the reads, and chose
the lightest "lossy" approach, leaving to stats kind what they want to
do when deallocating entries.  Hence, a stats kind may want to protect
the entry deallocations with an extra lock, but that's not required
either depending on how aggressive removals should happen.

Please find attached that adds an option for variable-sized stats
kinds given these the possibility to track the number of entries in
the shared hash table.  The option is named track_counts.  The patch
has some coverage added in the test module injection_points, in its
TAP test.

Thoughts are welcome.
--
Michael

Вложения

Re: Add support for entry counting in pgstats

От
Chao Li
Дата:


On Sep 12, 2025, at 15:23, Michael Paquier <michael@paquier.xyz> wrote:


--
Michael
<0001-Add-support-for-entry-counting-in-pgstats.patch><0002-injection_points-Add-entry-counting.patch>

The code overall looks good to me, very clear and neat. Just a few small comments:

1 - 0001
```
+ * set.  This counter is incremented each time a new entry is created (not
+ * when reused), and each time an entry is dropped.
+ */
+ pg_atomic_uint64 entry_counts[PGSTAT_KIND_MAX];
```

“and each time an entry is dropped” is a little misleading, it should be “and decremented when an entry is removed”.

2 - 0001
```
+ /* Increment entry count, if required. */
+ if (kind_info->track_counts)
+ pg_atomic_fetch_add_u64(&pgStatLocal.shmem->entry_counts[kind - 1], 1);
```

The code is quite straightforward, so the comment seems unnecessary.

3 - 0001
```
+ if (kind_info->track_counts)
+ {
+ ereport(ERROR,
+ (errmsg("custom cumulative statistics property is invalid"),
+ errhint("Custom cumulative statistics cannot use counter tracking for fixed-numbered objects.")));
+ }
```

{} are not needed. Looks like in the existing code, when “if” has a single statement, {} are not used. There is a similar “if” right above this change.

4 - 0004
```
diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index e3947b23ba57..15ad9883dedb 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -49,6 +49,7 @@ static const PgStat_KindInfo injection_stats = {
  .shared_data_len = sizeof(((PgStatShared_InjectionPoint *) 0)->stats),
  .pending_size = sizeof(PgStat_StatInjEntry),
  .flush_pending_cb = injection_stats_flush_cb,
+ .track_counts = true,
 };
```

Would it be better to move “.track_counts” up to be with the other boolean flags? Whey define together to share the same byte, feels better to initialize them together as well.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Re: Add support for entry counting in pgstats

От
Sami Imseih
Дата:
Thanks for this patch!

> I have looked at what can be done, and finished with a rather simple
> approach, as we have only one code path when a new entry is inserted,
> and one code path when an entry is removed when the refcount of an
> entry reaches 0 (includes resets, drops for kind matches, etc.).  The
> counters are stored in a uint64 atomic, one for each stats kind in
> PgStat_ShmemControl, set only if the option is enabled.

The refcount reaches 0 when all backends release their references to the
stat, so if something like pg_stat_statements relies on a count for
deallocation purposes (to stay within the max), and that value only
decrements when all references are released, then it could end up
constantly triggering deallocation attempts. So, I am wondering if we
actually need another value that tracks "live" entries, or those that
have not yet been marked for drop. This means the live entries count
is decremented as soon as the entry is set to ->dropped.

> Hence, a stats kind may want to protect
> the entry deallocations with an extra lock, but that's not required
> either depending on how aggressive removals should happen.

+1.

As a side note, maybe I am missing something here:
In [0], should this not say ".... the entry should not be freed" instead of
".... the entry should not be dropped". The refcount deals with the freeing
of the entry after all refs are released.

[0]
https://github.com/postgres/postgres/blob/f2bae51dfd5b2edc460c86071c577a45a1acbfd7/src/include/utils/pgstat_internal.h#L98-L99

--
Sami



Re: Add support for entry counting in pgstats

От
Michael Paquier
Дата:
On Tue, Sep 23, 2025 at 01:39:00PM -0500, Sami Imseih wrote:
> The refcount reaches 0 when all backends release their references to the
> stat, so if something like pg_stat_statements relies on a count for
> deallocation purposes (to stay within the max), and that value only
> decrements when all references are released, then it could end up
> constantly triggering deallocation attempts. So, I am wondering if we
> actually need another value that tracks "live" entries, or those that
> have not yet been marked for drop. This means the live entries count
> is decremented as soon as the entry is set to ->dropped.

Couldn't that mean a potential bloat in terms of memory in the dshash
if we have a large cycle of objects still held around but marked to be
gone?  That sounds risky to me as it could go out of control.  The
counter tracking exactly the number of dshash entries would provide a
tighter control, and perhaps this would be OK if the actual
deallocations are grouped.  Your suggestion would not be difficult to
implement, as well, requiring an extra tweak when entries are reused,
and that's handled in a unique path.

> As a side note, maybe I am missing something here:
> In [0], should this not say ".... the entry should not be freed" instead of
> ".... the entry should not be dropped". The refcount deals with the freeing
> of the entry after all refs are released.
>
> [0]
https://github.com/postgres/postgres/blob/f2bae51dfd5b2edc460c86071c577a45a1acbfd7/src/include/utils/pgstat_internal.h#L98-L99

The sentence looks correct to me?  The refcount cannot be incremented
if an entry is marked for drop, as far as I recall.
--
Michael

Вложения

Re: Add support for entry counting in pgstats

От
Keisuke Kuroda
Дата:
Testing via the Injection Point has been successfully completed.

> The option is named track_counts.
Regarding the option name track_counts in PgStat_KindInfo.
In my personal opinion, I was just wondering that it shares
the same name as the GUC track_counts(pgstat_track_counts in the source code).
If we want to make it clearer, renaming it to track_entry_counts
could be one option.
(That said, I feel the GUC track_counts option has a mismatch
between its role and its name...)

Best Regards,
-- 
Keisuke Kuroda
NTT DOCOMO SOLUTIONS, Inc.



Re: Add support for entry counting in pgstats

От
Michael Paquier
Дата:
On Wed, Sep 24, 2025 at 11:37:25AM +0900, Keisuke Kuroda wrote:
> Regarding the option name track_counts in PgStat_KindInfo.
> In my personal opinion, I was just wondering that it shares
> the same name as the GUC track_counts(pgstat_track_counts in the source code).
> If we want to make it clearer, renaming it to track_entry_counts
> could be one option.

Yes, that's a good point and I have missed the GUC part.  What you are
suggesting is cleaner overall with the flag added to the pgstats kind
info.

> (That said, I feel the GUC track_counts option has a mismatch
> between its role and its name...)

I doubt that we will rename this GUC at any point in the future.

Updated patch attached with the new name, as you are suggesting.
--
Michael

Вложения

Re: Add support for entry counting in pgstats

От
Sami Imseih
Дата:
> On Tue, Sep 23, 2025 at 01:39:00PM -0500, Sami Imseih wrote:
> > The refcount reaches 0 when all backends release their references to the
> > stat, so if something like pg_stat_statements relies on a count for
> > deallocation purposes (to stay within the max), and that value only
> > decrements when all references are released, then it could end up
> > constantly triggering deallocation attempts. So, I am wondering if we
> > actually need another value that tracks "live" entries, or those that
> > have not yet been marked for drop. This means the live entries count
> > is decremented as soon as the entry is set to ->dropped.
>
> Couldn't that mean a potential bloat in terms of memory in the dshash
> if we have a large cycle of objects still held around but marked to be
> gone?  That sounds risky to me as it could go out of control.

I spent a bit of time testing this with a pg_stat_statements like extension
using a custom stats kind, and while I think there is value for both "live"
( ! ->dropped) counter and an exact dshash counter ( current proposal),
I rather go with the latter at least initially, for the sake of not having 2
atomic counters. Both will allow an extension to trigger some type of a
cleanup strategy, and in general, both should be very close in value.
That's at least my observation.

I do think however that the placement of the decrement is wrong, and that
it should go inside pgstat_free_entry, since pgstat_free_entry can be called
in multiple code paths. In my high churn workload, v2 ended up increasing way
beyond the actual size of the dshash. See the attached for what
I did to fix.

>> Regarding the option name track_counts in PgStat_KindInfo.
>> In my personal opinion, I was just wondering that it shares
>> the same name as the GUC track_counts(pgstat_track_counts in the source code).
>> If we want to make it clearer, renaming it to track_entry_counts
>> could be one option.

> Yes, that's a good point and I have missed the GUC part.  What you are
> suggesting is cleaner overall with the flag added to the pgstats kind
> info.

IMO, "entry_counts" does not sound correct. We are not tracking more
than one count. What about track_num_entries ?

--
Sami

Вложения

Re: Add support for entry counting in pgstats

От
Michael Paquier
Дата:
On Thu, Sep 25, 2025 at 07:47:48PM -0500, Sami Imseih wrote:
> I spent a bit of time testing this with a pg_stat_statements like extension
> using a custom stats kind, and while I think there is value for both "live"
> ( ! ->dropped) counter and an exact dshash counter ( current proposal),
> I rather go with the latter at least initially, for the sake of not having 2
> atomic counters. Both will allow an extension to trigger some type of a
> cleanup strategy, and in general, both should be very close in value.
> That's at least my observation.

Thanks for the feedback.  I'm feeling encouraged by this opinion about
the "hard" counter, so I'd like to move on with it.

> I do think however that the placement of the decrement is wrong, and that
> it should go inside pgstat_free_entry, since pgstat_free_entry can be called
> in multiple code paths. In my high churn workload, v2 ended up increasing way
> beyond the actual size of the dshash. See the attached for what
> I did to fix.

Yes, you are right and the patch is wrong.  I've done the following
with my previous patch with injection_points loaded and its stats
activated, confirming your argument:
1) session with psql running in a tight loop and these queries:
select injection_points_attach('popo', 'notice');
select injection_points_run('popo');
select injection_points_detach('popo');
2) session that fetches the stats entry
select injection_points_stats_numcalls('popo');
\watch 0
3) session that checks the number of entries, should be 0 or 1:
select injection_points_stats_count();
\watch 1

And the numbers reported by the third session increase over time,
which is no good, once my patch is used.  Once we move the
decrementation to pgstat_free_entry() as you propose, the counter is
under control and capped.

> IMO, "entry_counts" does not sound correct. We are not tracking more
> than one count. What about track_num_entries ?

Hmm.  track_entry_counts resonates with the field name entry_counts,
and it's true that using the plural form is confusing.  How about
track_entry_count in PgStat_KindInfo instead?
--
Michael

Вложения

Re: Add support for entry counting in pgstats

От
Sami Imseih
Дата:
Thanks for v3. The only remaining comment I have is:

This comment seems unnecessary, since refcount is not checked inside
pgstat_free_entry, but earlier.

+       /*
+        * Entry has been dropped with refcount at 0, hence decrement the
+        * entry counter.
+        */

I would just say this,

/* Decrement entry count, if required. */

--
Sami



Re: Add support for entry counting in pgstats

От
Michael Paquier
Дата:
On Fri, Sep 26, 2025 at 12:09:45PM -0500, Sami Imseih wrote:
> Thanks for v3. The only remaining comment I have is:

Thanks for the extra lookup.  I have fixed this one, incorporated the
feedback from Chao, and applied the result after more tests with
pgbench to check the state of the counter.

With more concurrency and an instance set up like that in
postgresql.conf:
shared_preload_libraries = 'injection_points'
injection_points.stats = on

And of course that:
psql -c 'create extension injection_points'

Then with the following flow, checking that the count never gets
higher than the number of clients:
1) pgbench -n -T 300 -f create_inj.sql -c 10
$ cat create_inj.sql
\set id random(1,100000)
select injection_points_attach('popo:id', 'notice');
select injection_points_run('popo:id');
select injection_points_detach('popo:id');
2) pgbench -n -T 300 -f select_inj.sql -c 10
$ cat select_inj.sql
\set id random(1,100000)
select injection_points_stats_numcalls('popo:id');
3) psql session:
select injection_points_stats_count();
\watch 1

And I have found, cough, a rather embarrassing bug on the way,
unrelated to the proposal of this thread.  It's in the module
injection_points for its fixed stats, so nothing ground-breaking,
still embarrassing.   Will post a fix on a new thread shortly..
--
Michael

Вложения