Обсуждение: [Proposal] Adding callback support for custom statistics kinds

Поиск

Список

Период

Сортировка

[Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

22 октября, 23:24:11

Hi,

I'd like to propose $SUBJECT to serialize additional per-entry data beyond
the standard statistics entries. Currently, custom statistics kinds can store
their standard entry data in the main "pgstat.stat" file, but there is no
mechanism for extensions to persist extra data stored in the entry. A common
use case is extensions that register a custom kind and, besides
standard counters,
need to track variable-length data stored in a dsa_pointer.

This proposal adds optional "to_serialized_extra" and
"from_serialized_extra" callbacks to "PgStat_KindInfo" that allow custom kinds
to write and read from extra data in a separate files
(pgstat.<kind>.stat). The callbacks
give extensions direct access to the file pointer so they can read and write
data in any format, while the core "pgstat" infrastructure manages
opening, closing, renaming, and cleanup, just as it does with "pgstat.stat".

A concrete use case is pg_stat_statements. If it were to use custom
stats kinds to track statement counters, it could also track query text
stored in DSA. The callbacks allow saving the query text referenced by the
dsa_pointer and restoring it after a clean shutdown. Since DSA
(and more specifically DSM) cannot be attached by the postmaster, an
extension cannot use "on_shmem_exit" or "shmem_startup_hook"
to serialize or restore this data. This is why pgstat handles
serialization during checkpointer shutdown and startup, allowing a single
backend to manage it safely.

I considered adding hooks to the existing pgstat code paths
(pgstat_before_server_shutdown, pgstat_discard_stats, and
pgstat_restore_stats), but that felt too unrestricted. Using per-kind
callbacks provides more control.

There are already "to_serialized_name" and "from_serialized_name"
callbacks used to store and read entries by "name" instead of
"PgStat_HashKey", currently used by replication slot stats. Those
remain unchanged, as they serve a separate purpose.

Other design points:

1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID.
This avoids requiring extensions to provide names and prevents issues
with spaces or special characters.

2. Both callbacks must be registered together. Serializing without
deserializing would leave orphaned files behind, and I cannot think of a
reason to allow this.

3. "write_chunk", "read_chunk", "write_chunk_s", and
"read_chunk_s" are renamed to "pgstat_write_chunk", etc., and
moved to "pgstat_internal.h" so extensions can use them without
re-implementing these functions.

4. These callbacks are valid only for custom, variable-numbered statistics
kinds. Custom fixed kinds may not benefit, but could be considered in the
future.

Attached 0001 is the proposed change, still in POC form. The second patch
contains  tests in "injection_points" to demonstrate this proposal, and is not
necessarily intended for commit.

Looking forward to your feedback!


--

Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

23 октября, 06:33:02

On Wed, Oct 22, 2025 at 03:24:11PM -0500, Sami Imseih wrote:
> I'd like to propose $SUBJECT to serialize additional per-entry data beyond
> the standard statistics entries. Currently, custom statistics kinds can store
> their standard entry data in the main "pgstat.stat" file, but there is no
> mechanism for extensions to persist extra data stored in the entry. A common
> use case is extensions that register a custom kind and, besides
> standard counters,
> need to track variable-length data stored in a dsa_pointer.

Thanks for sending a proposal in this direction.

> A concrete use case is pg_stat_statements. If it were to use custom
> stats kinds to track statement counters, it could also track query text
> stored in DSA. The callbacks allow saving the query text referenced by the
> dsa_pointer and restoring it after a clean shutdown. Since DSA
> (and more specifically DSM) cannot be attached by the postmaster, an
> extension cannot use "on_shmem_exit" or "shmem_startup_hook"
> to serialize or restore this data. This is why pgstat handles
> serialization during checkpointer shutdown and startup, allowing a single
> backend to manage it safely.

Agreed that it would be better to split the query text in a file of
its own and now bloat the "main" pgstats file with this data,  A real
risk is that many PGSS entries with a bunch of queries would cause the
file to be just full of the PGSS contents.

> I considered adding hooks to the existing pgstat code paths
> (pgstat_before_server_shutdown, pgstat_discard_stats, and
> pgstat_restore_stats), but that felt too unrestricted. Using per-kind
> callbacks provides more control.

Per-kind callbacks to control all that makes sense here.

> There are already "to_serialized_name" and "from_serialized_name"
> callbacks used to store and read entries by "name" instead of
> "PgStat_HashKey", currently used by replication slot stats. Those
> remain unchanged, as they serve a separate purpose.
>
> Other design points:
>
> 1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID.
> This avoids requiring extensions to provide names and prevents issues
> with spaces or special characters.

Hmm.  Is that really what we want here?  This pretty says that one
single custom kind would never be able use multiple files, ever.

> 2. Both callbacks must be registered together. Serializing without
> deserializing would leave orphaned files behind, and I cannot think of a
> reason to allow this.

Hmm.  Okay.

> 3. "write_chunk", "read_chunk", "write_chunk_s", and
> "read_chunk_s" are renamed to "pgstat_write_chunk", etc., and
> moved to "pgstat_internal.h" so extensions can use them without
> re-implementing these functions.

Exposing the write and read chunk APIs and renaming them sounds good
here, designed as they are now with a FILE* defined by the caller.
It's good to share these for consistency across custom and built-in
stats kinds.

> 4. These callbacks are valid only for custom, variable-numbered statistics
> kinds. Custom fixed kinds may not benefit, but could be considered in the
> future.

Pushing custom data for fixed-sized stats may be interesting, though
like you I am not sure what a good use-case would look like.  So
discarding this case for now sounds fine to me.

> Attached 0001 is the proposed change, still in POC form.

Hmm.  I would like to propose something a bit more flexible,
refactoring and reusing some of the existing callbacks, among the
following lines:
- Rather than introducing a second callback able to do more
serialization work, let's expand a bit the responsibility of
to_serialized_name and from_serialized_name to be able to work in a
more extended way, renaming them to "to/from_serialized_entry", which
are now limited to return a NameData with pgstat.c enforcing the data
written to the pgstats to be of NAMEDATALEN.  The idea would be to let
the callbacks push some custom data where they want.
- The to_serialized_name path of pgstat_write_statsfile() would then
be changed as follows:
-- push a PGSTAT_FILE_ENTRY_NAME
-- Write the key write_chunk_s.
-- Call the callback to push some custom per-entry data.
-- Finish with the main chunk of data, of size pgstat_get_entry_len().
- The fd or FILE* of the "main" pgstats file should be added as
argument of both routines (not mandatory, but we are likely going to
need that if we want to add more "custom" data in the main pgstats
file before writing or reading a chunk).  For example, for a PGSS text
file, we would likely write two fields to the main data file: an
offset and a length to be able to retrieve a query string, from a
secondary file.
- FDs where the data is written while we are in the to/from serialize
can be handled within the code paths specific to the stats kind code.
The first time a serialized callback of a stats kind is called, the
extra file(s) is(are) opened.  This may come at the cost of one new
callback: at the end of the read and writes of the stats data, we
would need an extra look that's able to perform cleanup actions, which
would be here to make sure that the fds opened for the extra files are
closed when we are done.  The close of each file is equivalent to the
pgstat_close_file() done in the patch, except that we'd loop over a
callback that would do the cleanup job once we are done reading or
writing a file.  One step that can be customized in this new "end"
callback is if a stats kind may decide to unlink() a previous file, as
we do for the main pgstats file, or keep one or more files around.
That would be up to the extension developer.  We should be able to
reuse or rework reset_all_cb() with a status given to it, depending on
if we are dealing with a failure or a success path.  Currently,
reset_all_cb() is only used in a failure path, the idea would be to
extend it for the success case.

> The second patch
> contains  tests in "injection_points" to demonstrate this proposal, and is not
> necessarily intended for commit.

Having coverage for these kinds of APIs is always good, IMO.  We need
coverage for extension code.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

24 октября, 00:35:58

Thanks for the feedback!

> > Other design points:
> >
> > 1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID.
> > This avoids requiring extensions to provide names and prevents issues
> > with spaces or special characters.
>
> Hmm.  Is that really what we want here?  This pretty says that one
> single custom kind would never be able use multiple files, ever.

Perhaps if someone wants to have separate files for each different
types of data,
we should be able to support multiple files. I think we can add an
option for the
number of files and they can then be named "pgstat.<kind>.1.stat",
pgstat.<kind>.2.stat",
etc. I rather avoid having the extension provide a set of files names.
So as arguments to the callback, besides the main file pointer ( as
you mention below),
we also provide the list of custom file pointers.

what do you think?

> Hmm.  I would like to propose something a bit more flexible,
> refactoring and reusing some of the existing callbacks, among the
> following lines:
> - Rather than introducing a second callback able to do more
> serialization work, let's expand a bit the responsibility of
> to_serialized_name and from_serialized_name to be able to work in a
> more extended way, renaming them to "to/from_serialized_entry", which

Sure, we can go that route.

> - The fd or FILE* of the "main" pgstats file should be added as
> argument of both routines (not mandatory, but we are likely going to
> need that if we want to add more "custom" data in the main pgstats
> file before writing or reading a chunk).  For example, for a PGSS text
> file, we would likely write two fields to the main data file: an
> offset and a length to be able to retrieve a query string, from a
> secondary file.

Yeah, that could be a good idea for pg_s_s, if we don't want to store the key
alongside the query text. Make more sense.

> - FDs where the data is written while we are in the to/from serialize
> can be handled within the code paths specific to the stats kind code.
> The first time a serialized callback of a stats kind is called, the
> extra file(s) is(are) opened.  This may come at the cost of one new
> callback: at the end of the read and writes of the stats data, we
> would need an extra look that's able to perform cleanup actions, which
> would be here to make sure that the fds opened for the extra files are
> closed when we are done.  The close of each file is equivalent to the
> pgstat_close_file() done in the patch, except that we'd loop over a
> callback that would do the cleanup job once we are done reading or
> writing a file.  One step that can be customized in this new "end"
> callback is if a stats kind may decide to unlink() a previous file, as
> we do for the main pgstats file, or keep one or more files around.
> That would be up to the extension developer.  We should be able to
> reuse or rework reset_all_cb() with a status given to it, depending on
> if we are dealing with a failure or a success path.  Currently,
> reset_all_cb() is only used in a failure path, the idea would be to
> extend it for the success case.

I will provide a patch with the recommendations.

--
Sami

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

24 октября, 02:33:17

On Thu, Oct 23, 2025 at 04:35:58PM -0500, Sami Imseih wrote:
> Perhaps if someone wants to have separate files for each different
> types of data,
> we should be able to support multiple files. I think we can add an
> option for the
> number of files and they can then be named "pgstat.<kind>.1.stat",
> pgstat.<kind>.2.stat",
> etc. I rather avoid having the extension provide a set of files names.
> So as arguments to the callback, besides the main file pointer ( as
> you mention below),
> we also provide the list of custom file pointers.
>
> what do you think?

My worry here is the lack of flexibility regarding stats that could be
split depending on the objects whose data needs to be flushed.  For
example, stats split across multiple databases (like our good-old
pre-v14 pgstats, but on a per-kind basis).  So I don't think that we
can really assume that the list of file names should be fixed when we
begin the read/write process of the main pgstats file.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

24 октября, 03:57:38

> On Thu, Oct 23, 2025 at 04:35:58PM -0500, Sami Imseih wrote:
> > Perhaps if someone wants to have separate files for each different
> > types of data,
> > we should be able to support multiple files. I think we can add an
> > option for the
> > number of files and they can then be named "pgstat.<kind>.1.stat",
> > pgstat.<kind>.2.stat",
> > etc. I rather avoid having the extension provide a set of files names.
> > So as arguments to the callback, besides the main file pointer ( as
> > you mention below),
> > we also provide the list of custom file pointers.
> >
> > what do you think?
>
> My worry here is the lack of flexibility regarding stats that could be
> split depending on the objects whose data needs to be flushed.  For
> example, stats split across multiple databases (like our good-old
> pre-v14 pgstats, but on a per-kind basis).  So I don't think that we
> can really assume that the list of file names should be fixed when we
> begin the read/write process of the main pgstats file.

I was trying to avoid an extra field in PgStat_KindInfo if possible, but
it's worthwhile to provide more flexibility to an extension. I will go
with this.

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

24 октября, 08:54:22

On Thu, Oct 23, 2025 at 07:57:38PM -0500, Sami Imseih wrote:
> I was trying to avoid an extra field in PgStat_KindInfo if possible, but
> it's worthwhile to provide more flexibility to an extension. I will go
> with this.

Yes, I don't think that we will be able to avoid some refactoring of
the existing callbacks.  The introduction of a new one may not be
completely necessary, though, especially if we reuse the reset
callback to be called when the stats read and write finish to close
any fds we may have opened when processing.

Maintaining the state of the files opened within each stat kind code
across multiple calls of the new "serialized" callback feels a bit
more natural and more flexible, at least it's my take on the matter.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

10 ноября, 22:56:23

> > Hmm.  I would like to propose something a bit more flexible,
> > refactoring and reusing some of the existing callbacks, among the
> > following lines:
> > - Rather than introducing a second callback able to do more
> > serialization work, let's expand a bit the responsibility of
> > to_serialized_name and from_serialized_name to be able to work in a
> > more extended way, renaming them to "to/from_serialized_entry", which
>
> Sure, we can go that route.

I started reworking the patch, but then I realized that I don't like this
approach of using the same callback to support serializing NameData and
serializing extra data. In the existing "to_serialized_name" callback
, NameData is serialized instead of the hash key, meaning that the
"from_serialized_name" must be called before we create an entry. The
callback translates the NameData to an objid as is the case with replication
slots, and the key is then used to create the entry.

However, in the case of serializing extra data, we want to have already
created the entry by the time we call the callback. For example populating
non-key fields of an entry with a dsa_pointer after reading some serialized
data into dsa.

If we do want to support a single callback, we would need extra metadata in
the Kind registration to let the extension tell us what the callback is used
for and to either trigger the callback before or after entry creation. I am
not very thrilled about doing something like this, as I see 2 very different
use-cases here.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

11 ноября, 02:21:26

On Mon, Nov 10, 2025 at 01:56:23PM -0600, Sami Imseih wrote:
> I started reworking the patch, but then I realized that I don't like this
> approach of using the same callback to support serializing NameData and
> serializing extra data. In the existing "to_serialized_name" callback
> , NameData is serialized instead of the hash key, meaning that the
> "from_serialized_name" must be called before we create an entry. The
> callback translates the NameData to an objid as is the case with replication
> slots, and the key is then used to create the entry.

Thanks for looking at that.

> However, in the case of serializing extra data, we want to have already
> created the entry by the time we call the callback. For example populating
> non-key fields of an entry with a dsa_pointer after reading some serialized
> data into dsa.
>
> If we do want to support a single callback, we would need extra metadata in
> the Kind registration to let the extension tell us what the callback is used
> for and to either trigger the callback before or after entry creation. I am
> not very thrilled about doing something like this, as I see 2 very different
> use-cases here.

Ah, I see your point.  By keeping two callbacks, one to translate a
key to/from a different field (NameData currently, but it could be
something else with a different size), we would for example be able to
keep very simple the checks for duplicated entries when reading the
file.  Agreed that it would be good to keep the key lookups as stable
as we can.

So, what you are suggested is a second callback once we have called
read_chunk() and write_chunk() for a PGSTAT_FILE_ENTRY_HASH or a
PGSTAT_FILE_ENTRY_NAME and let a stats kind write in the main file
and/or one or more extra files the data they want?  I'd be fine with
that, yes, and that should work with the PGSS case in mind.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

19 ноября, 23:10:42

Sorry for the delay here.

v1 is the first attempt to address the feedback from the POC.

1/ A user is now able to register as many extra files as they
wish, and the files will be named pgstat.<kind_id>.<file no>.stat,
where file_no starts at 0 up to the number of files specified
by the user with .num_serialized_extra_files.

2/ The callbacks now provide both the core stat file as a FILE
pointer and an array of FILE pointers for the extra files.
IN the write callback, the extra file pointer is accessed
like extra_files[0], extra_files[1], etc., and the same for
the read callback.

3/ The patch centralizes the creation and cleanup of the files
with 2 new routines pgstat_allocate_files and pgstat_cleanup_files,
which both operate on a new local struct which tracks the file
names and descriptors in the read and write stats routines.

```
typedef struct PgStat_SerializeFiles
{
    char      **tmpfiles;
    char      **statfiles;
    FILE      **fd;
    int            num_files;
}            PgStat_SerializeFiles;
```

plug-ins are not made aware of this struct because they don't need
to. The callbacks are provided the FILE pointers they need to care
about for their kind only.

4/ In terms of testing, patch 0002, I did not want to invent a new module
for custom kinds, so I piggybacked off og injection_points as I did in the
POC, but I added on the existing recovery tests, because essentially that
is what we care. Does the data persist after a clean shutdown? do the
.tmp files get removed properly? etc. So I added tests in
recovery/t/029_stats_restart.pl for this.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

20 ноября, 05:10:43

It just occurred to me that the documentation [0] should be
updated to describe the callbacks. I will do that in the next
revision.

[0] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

24 ноября, 03:10:29

On Wed, Nov 19, 2025 at 08:10:43PM -0600, Sami Imseih wrote:
> It just occurred to me that the documentation [0] should be
> updated to describe the callbacks. I will do that in the next
> revision.
>
> [0] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS

Hmm.  Based on what I can read from the patch, you are still enforcing
file name patterns in the backend, as of:
+          extra->statfiles[i] = psprintf("%s/pgstat.%d.%d.stat",
+                        PGSTAT_STAT_PERMANENT_DIRECTORY, kind, i);

My take (also mentioned upthread) is that this design should go the
other way around, where modules have the possibility to define their
own file names, and some of them could be generated on-the-fly when
writing the files, for example for a per-file database split, or the
object ID itself.

The important part for variable-numbered stats is that the keys of the
entries have to be in the main pgstats file.  Then, the extra data is
loaded back based on the data in the entry key, based on a file name
that only a custom stats kind knows about (fd and file name).  It
means that the custom stats kind needs to track the files it has to
clean up by itself in this scheme.  We could pass back to the startup
process some fds that it cleans up, but it feels simpler here to let
the custom code do what they want, instead, rather than having an
array that tracks the file names and/or their fds.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

24 ноября, 07:43:17

> > It just occurred to me that the documentation [0] should be
> > updated to describe the callbacks. I will do that in the next
> > revision.
> >
> > [0] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS
>
> Hmm.  Based on what I can read from the patch, you are still enforcing
> file name patterns in the backend, as of:
> +          extra->statfiles[i] = psprintf("%s/pgstat.%d.%d.stat",
> +                        PGSTAT_STAT_PERMANENT_DIRECTORY, kind, i);
>
> My take (also mentioned upthread) is that this design should go the
> other way around, where modules have the possibility to define their
> own file names, and some of them could be generated on-the-fly when
> writing the files, for example for a per-file database split, or the
> object ID itself.

The way I thought about it is that extension developer can just provide the
number of files they need and the they are then given a list of
file pointers that they need. They can then manage what each file is
used for. They also don't need to worry about naming the files, all they
need to do is track what each file in the list does.

> The important part for variable-numbered stats is that the keys of the
> entries have to be in the main pgstats file.  Then, the extra data is
> loaded back based on the data in the entry key, based on a file name
> that only a custom stats kind knows about (fd and file name).  It
> means that the custom stats kind needs to track the files it has to
> clean up by itself in this scheme.  We could pass back to the startup
> process some fds that it cleans up, but it feels simpler here to let
> the custom code do what they want, instead, rather than having an
> array that tracks the file names and/or their fds.

yeah, I was leaning towards putting more responsibility on pgstat to
manage these extra files, but you are suggesting that we just let the
extension manage the create/cleanup of these files as well.

After re-reading your earlier suggestions, this sounds like a third
callback that is used for file cleanup, and this callback could be
the existing reset_all_cb. Also, instead of reset_all_cb being called
during pgstat_reset_after_failure, it can be called during the success
case, i.e, a new pgstat_reset_after_success. reset_all_cb also
carries a status argument so the extension knows what to do
in the case of success or failure.

This also means we need to also update all existing callbacks to
do work in the failed status.

Is that correct?

--
Sami

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

25 ноября, 03:18:26

> After re-reading your earlier suggestions, this sounds like a third
> callback that is used for file cleanup, and this callback could be
> the existing reset_all_cb. Also, instead of reset_all_cb being called
> during pgstat_reset_after_failure, it can be called during the success
> case, i.e, a new pgstat_reset_after_success. reset_all_cb also
> carries a status argument so the extension knows what to do
> in the case of success or failure.

> This also means we need to also update all existing callbacks to
> do work in the failed status.

After second thought, I am not too thrilled with extending reset_all_cb
to take responsibility for file cleanup, etc. I think it should just remain
used to reset stats only.

I think the best way forward will be to introduce a callback to be used by
custom kinds only. This callback will be responsible for cleaning up files
and related resources at the end of the write stats, read stats, and discard
stats paths. The callback will provide back to the extension a status
(READ, WRITE, DISCARD) and the extension will know how to clean up the
resources it created depending on the situation.

So, unlike my original proposal, this puts more responsibility on the
extension to track and clean up its files, but this seems like the best
approach to take here.

Also, I am now leaning towards creating a separate test module rather than
trying to do too much unrelated testing in injection points. It is definitely
convenient to use injection points, but I think we can do better testing with
a separate module. This module can also serve as an example for extension
developers.

what do you think?

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

25 ноября, 05:03:22

On Mon, Nov 24, 2025 at 06:18:26PM -0600, Sami Imseih wrote:
> After second thought, I am not too thrilled with extending reset_all_cb
> to take responsibility for file cleanup, etc. I think it should just remain
> used to reset stats only.
>
> I think the best way forward will be to introduce a callback to be used by
> custom kinds only. This callback will be responsible for cleaning up files
> and related resources at the end of the write stats, read stats, and discard
> stats paths. The callback will provide back to the extension a status
> (READ, WRITE, DISCARD) and the extension will know how to clean up the
> resources it created depending on the situation.

I guess that READ and WRITE are the cases that happen on success of
these respective operations.  DISCARD is the failure case when one of
these fail.

> So, unlike my original proposal, this puts more responsibility on the
> extension to track and clean up its files, but this seems like the best
> approach to take here.

That may be something we should do anyway.  It means that the modules
are responsible for the tracking the file(s) they open, still they
could also decide operations different than the backend for the main
pgstats file, optionally, depending on the state of the reads and
writes (aka success or failure of these).

> Also, I am now leaning towards creating a separate test module rather than
> trying to do too much unrelated testing in injection points. It is definitely
> convenient to use injection points, but I think we can do better testing with
> a separate module. This module can also serve as an example for extension
> developers.

You are right that it may be cleaner this way.  Do you think that it
could make sense to move some of the existing "template" code of
injection_points there?

One part of the proposed patch that felt independent to me was the
renaming and publishing of the two write/read routines for the stats
files, so I have extracted that in your first patch to reduce the
blast, and applied that as it can also be useful on its own.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

02 декабря, 23:58:32

> > Also, I am now leaning towards creating a separate test module rather than
> > trying to do too much unrelated testing in injection points. It is definitely
> > convenient to use injection points, but I think we can do better testing with
> > a separate module. This module can also serve as an example for extension
> > developers.
>
> You are right that it may be cleaner this way.  Do you think that it
> could make sense to move some of the existing "template" code of
> injection_points there?

By "template" code, do you mean Something like?

include/utils/custom_statkinds.h
backend/utils/misc/custom_statkinds.c

Where the template code here is PgStat_kind definition, callbacks, etc. for
injection_points or the new test module that is using a custom kind.

A few benefits I see for this is we can point extension developers to
this as an example in [0] and we are also maintaining the kind ids in
a single place. These may not be strong points, but may be worth while.

v2 attached is something that may be closer to what we've been discussing

v2-0001 are much simplified changes to pgstat.c that simply invoke the callbacks
and all the work is on the extension to implement what it needs to do.
This includes
a callback at the end of WRITE, READ, DISCARD with a flag passed to the caller
so they can perform the necessary clean-up actions.

v2-0002 implements a new test module that tests mainly that the recovery,
clean and crash, are working as expected.

I created a new tap test for this which performs a test similar to what is
done in recovery/029_stats_restart.pl. I could merge the new test there, but
I am reluctant to add a dependency on a new module to recovery. What
do you think?

[0] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS


--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

03 декабря, 03:01:14

On Tue, Dec 02, 2025 at 02:58:32PM -0600, Sami Imseih wrote:
> By "template" code, do you mean Something like?
>
> include/utils/custom_statkinds.h
> backend/utils/misc/custom_statkinds.c
>
> Where the template code here is PgStat_kind definition, callbacks, etc. for
> injection_points or the new test module that is using a custom kind.

I am not completely sure that we need a separate C file for a portion
of the code related to custom stats kinds.  At least, I am not sure to
see which part of pgstat.c, pgstat_internal.h and pgstat_shmem.c could
be extracted so as a custom_statkinds.c could have value when taken
independently.  A test module makes the most sense for such templates
IMO, as they can be compiled and checked directly.

> v2-0001 are much simplified changes to pgstat.c that simply invoke the callbacks
> and all the work is on the extension to implement what it needs to do.
> This includes
> a callback at the end of WRITE, READ, DISCARD with a flag passed to the caller
> so they can perform the necessary clean-up actions.

+    void        (*to_serialized_extra_stats) (const PgStat_HashKey *key,
+                                              const PgStatShared_Common *header, FILE *statfile);
+    void        (*from_serialized_extra_stats) (const PgStat_HashKey *key,
+                                                const PgStatShared_Common *header, FILE *statfile);
+    void        (*end_extra_stats) (PgStat_StatsFileOp status);
[...]
+typedef struct PgStatShared_CustomEntry
+{
+    PgStatShared_Common header;
+    PgStat_StatCustomEntry stats;
+    char        name[NAMEDATALEN];
+    dsa_pointer description;
+}            PgStatShared_CustomEntry;

I'm cool with this design, including your point about using a DSA
pointer in a stats entry, manipulating this data through the
serialization callback.  Your module does not use the FILE* which
points to the main stats file for the to/from extra serialized
callbacks, it seems important to document in pgstat_internal.h that
this points to the "main" pgstats file manipulated by the startup
process when loading or by the checkpointer when flushing.

Perhaps the callback in the module for end_extra_stats should use a
switch based on PgStat_StatsFileOp.  Minor point.

+/* File handle for statistics serialization */
+static FILE *fd = NULL;

Using a fd tracked directly by the module sounds good to me, yes.
That gives to the modules the flexibility to decide what should be the
extra files to know about, some file name patterns being possible to
decide based on the stats entry keys that need to be written, with
files opened when actually required.

> v2-0002 implements a new test module that tests mainly that the recovery,
> clean and crash, are working as expected.

That looks like a good direction to me.  The only differences I can
see with the stats module in injection_points for variable-sized stats
is that this new module does not check pgstat_drop_entry() and
pgstat_fetch_entry() when working on a custom stats kind.  If we had
SQL interfaces calling these two, we could just remove
injection_stats.c entirely, moving everything to this new test module.
I should have invented a new module from the start, perhaps, but well,
that was good enough to check the basic APIs when working on the
custom APIs.  Removing this duplication would be my own business with
your module in the tree, no need for you to worry about that.  That
would also remove the tweak you have used regarding the duplicated
kind ID.

Perhaps we should do the same for the fixed-sized kind at the end, and
instead of using one .so for both of them, we could just create a
separate .so with multiple entries in MODULES?  What do you think?
What you have here is better than what's in the tree in terms of
module separation for HEAD.

> I created a new tap test for this which performs a test similar to what is
> done in recovery/029_stats_restart.pl. I could merge the new test there, but
> I am reluctant to add a dependency on a new module to recovery. What
> do you think?

Adding an extra item to recovery's EXTRA_INSTALL would be OK for me,
but it seems cleaner to me to keep the tests related to custom stats
in their own area like your patch 0002 is doing with its new test
module test_custom_statkind.  And 029_stats_restart.pl is already
covering a lot of ground.

+     if (pgstat_is_kind_custom(key.kind) && kind_info->from_serialized_extra_stats)
+         kind_info->from_serialized_extra_stats(&key, header, fpin);
[...]
+     if (pgstat_is_kind_custom(ps->key.kind) && kind_info->to_serialized_extra_stats)
+         kind_info->to_serialized_extra_stats(&ps->key, shstats, fpout);

These restrictions based on custom kinds do not seem mandatory.
Why not allowing built-in kinds the same set of operations?

+    /* Read and verify the hash key */
+    if (!pgstat_read_chunk(fd, (void *) key, sizeof(PgStat_HashKey)))
+        return;
[...]
+    /* Write the hash key to identify this entry */
+    pgstat_write_chunk(fd, (void *) key, sizeof(PgStat_HashKey));

I am puzzled by this part of 0002.  Why are you overwriting the key
once after loading it from the main pgstats file?  Writing the key to
cross-check that the data matches with what is in the main file is OK,
and this should be ensured because of the ordering of the data.  I
would have done it in a slightly different way, I guess, with the data
stored on disk in the main pgstats file including an offset to know
where to search in the secondary file.  That's what we would do for
PGSS as well, I guess, with the secondary file including data
structured as a set of:
- Entry key, cross-checked with the data read from the main file,
based on the offset stored in the main file.
- Length of extra data.
- The extra data contents.

As a whole, I find this patch pretty cool, particularly the point
about extending stats entries with DSAs, something that would be
essential for PGSS and move it to use pgstats because we don't want
the query strings in the main pgstats file and bloat it.  Nice.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Chao Li

Дата:

03 декабря, 08:41:44

> On Dec 3, 2025, at 04:58, Sami Imseih <samimseih@gmail.com> wrote:
>
>>> Also, I am now leaning towards creating a separate test module rather than
>>> trying to do too much unrelated testing in injection points. It is definitely
>>> convenient to use injection points, but I think we can do better testing with
>>> a separate module. This module can also serve as an example for extension
>>> developers.
>>
>> You are right that it may be cleaner this way.  Do you think that it
>> could make sense to move some of the existing "template" code of
>> injection_points there?
>
> By "template" code, do you mean Something like?
>
> include/utils/custom_statkinds.h
> backend/utils/misc/custom_statkinds.c
>
> Where the template code here is PgStat_kind definition, callbacks, etc. for
> injection_points or the new test module that is using a custom kind.
>
> A few benefits I see for this is we can point extension developers to
> this as an example in [0] and we are also maintaining the kind ids in
> a single place. These may not be strong points, but may be worth while.
>
> v2 attached is something that may be closer to what we've been discussing
>
> v2-0001 are much simplified changes to pgstat.c that simply invoke the callbacks
> and all the work is on the extension to implement what it needs to do.
> This includes
> a callback at the end of WRITE, READ, DISCARD with a flag passed to the caller
> so they can perform the necessary clean-up actions.
>
> v2-0002 implements a new test module that tests mainly that the recovery,
> clean and crash, are working as expected.
>
> I created a new tap test for this which performs a test similar to what is
> done in recovery/029_stats_restart.pl. I could merge the new test there, but
> I am reluctant to add a dependency on a new module to recovery. What
> do you think?
>
> [0] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS
>
>
> --
> Sami Imseih
> Amazon Web Services (AWS)
> <v2-0002-Tests-for-custom-stat-kinds.patch><v2-0001-pgstat-support-custom-serialization-files-and-cal.patch>

Thanks for the patch, I do think the feature will be useful. After reading the patch, I got a concern on the design:

This patch provides callbacks that requests (also allows) custom extensions to write stat files on their own behalf,
whichI think it’s unsafe. The problems coming out to my head includes: 

* An extension can write to any where on the storage, that what about it writes to /tmp and the files are deleted by
otherprocess or by a user manually incidentally? 
* pgstat has a pattern of writing files like: writing to tmp file first, then durable_rename(), how to ensure
extensionsto do the same pattern? Without this pattern, how to ensure reliability of stat files? 
* In the current path, pgstat performs its own write, then call callbacks. What about if a callback fails? Will that
leavepgstat in a stale state? 
* As extensions own file creation and deletion, in some case, staled file might be left on storage, who will be
responsiblefor cleaning up them? 

Given the goal of the feature is to allow extensions to serialize custom data, the callback should just return
serialized/deserializeddata, maybe together with some metadata, then pgstat should be responsible for writing the data.
Inother words, IMO, pgstat should always own stat files. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

03 декабря, 08:54:52

On Wed, Dec 03, 2025 at 01:41:44PM +0800, Chao Li wrote:
> Thanks for the patch, I do think the feature will be useful. After reading the patch, I got a concern on the design:
>
> This patch provides callbacks that requests (also allows) custom
> extensions to write stat files on their own behalf, which I think
> it’s unsafe. The problems coming out to my head includes:
>
> * An extension can write to any where on the storage, that what
> * about it writes to /tmp and the files are deleted by other process
> * or by a user manually incidentally?

I mean, just don't do that.  It's up to the extension developer to
decide what is safe or not, within the scope of the data folder.

> * pgstat has a pattern of writing files like: writing to tmp file
> * first, then durable_rename(), how to ensure extensions to do the
> * same pattern? Without this pattern, how to ensure reliability of
> * stat files?

Extension code would be responsible for ensuring that.

> * In the current path, pgstat performs its own write, then call
> * callbacks. What about if a callback fails? Will that leave pgstat
> * in a stale state?

For the write state, end_extra_stats() would take care of that.  It
depends on what kind of errors you would need to deal with, but as
proposed the patch would offer the same level of protection for the
writes of the stats, where we'd check for an error on the fd saved by
an extension for an extra file.

I think that you have a fair point about the stats read path though,
shouldn't we make the callback from_serialized_extra_stats() return a
state to be able to trigger a full-scale cleanup, at least?

> * As extensions own file creation and deletion, in some case, staled
> * file might be left on storage, who will be responsible for
> * cleaning up them?

The extension should be able to handle that, I guess.

> Given the goal of the feature is to allow extensions to serialize
> custom data, the callback should just return serialized/deserialized
> data, maybe together with some metadata, then pgstat should be
> responsible for writing the data. In other words, IMO, pgstat should
> always own stat files.

That's where my view of the matter differs, actually, pushing down the
responsibility into the extension code itself.  A key argument,
mentioned upthread, is that the file paths could depend on the stats
entry *keys*, which may not be known in advance when beginning the
flush of the stats.  Think about per-database file stats, or just
some per-object file stats, for example, which is an option that would
matter so as we do not bloat the main pgstats file.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Chao Li

Дата:

03 декабря, 09:16:54


> On Dec 3, 2025, at 13:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 03, 2025 at 01:41:44PM +0800, Chao Li wrote:
>> Thanks for the patch, I do think the feature will be useful. After reading the patch, I got a concern on the design:
>>
>> This patch provides callbacks that requests (also allows) custom
>> extensions to write stat files on their own behalf, which I think
>> it’s unsafe. The problems coming out to my head includes:
>>
>> * An extension can write to any where on the storage, that what
>> * about it writes to /tmp and the files are deleted by other process
>> * or by a user manually incidentally?
>
> I mean, just don't do that.  It's up to the extension developer to
> decide what is safe or not, within the scope of the data folder.
>
>> * pgstat has a pattern of writing files like: writing to tmp file
>> * first, then durable_rename(), how to ensure extensions to do the
>> * same pattern? Without this pattern, how to ensure reliability of
>> * stat files?
>
> Extension code would be responsible for ensuring that.
>
>> * In the current path, pgstat performs its own write, then call
>> * callbacks. What about if a callback fails? Will that leave pgstat
>> * in a stale state?
>
> For the write state, end_extra_stats() would take care of that.  It
> depends on what kind of errors you would need to deal with, but as
> proposed the patch would offer the same level of protection for the
> writes of the stats, where we'd check for an error on the fd saved by
> an extension for an extra file.
>
> I think that you have a fair point about the stats read path though,
> shouldn't we make the callback from_serialized_extra_stats() return a
> state to be able to trigger a full-scale cleanup, at least?
>
>> * As extensions own file creation and deletion, in some case, staled
>> * file might be left on storage, who will be responsible for
>> * cleaning up them?
>
> The extension should be able to handle that, I guess.

Yes, they of course can do, but that’s out of pgstat’s control. How can we ensure that?

>
>> Given the goal of the feature is to allow extensions to serialize
>> custom data, the callback should just return serialized/deserialized
>> data, maybe together with some metadata, then pgstat should be
>> responsible for writing the data. In other words, IMO, pgstat should
>> always own stat files.
>
> That's where my view of the matter differs, actually, pushing down the
> responsibility into the extension code itself.  A key argument,
> mentioned upthread, is that the file paths could depend on the stats
> entry *keys*, which may not be known in advance when beginning the
> flush of the stats.  Think about per-database file stats, or just
> some per-object file stats, for example, which is an option that would
> matter so as we do not bloat the main pgstats file.
> --
> Michael

If we push down the responsibility into the extension code, then all extensions that want to enjoy the callbacks have
tohandle the same complexities of dealing with stat files, which sounds big duplicate efforts. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

05 декабря, 02:00:10

> If we push down the responsibility into the extension code, then all extensions
> that want to enjoy the callbacks have to handle the same complexities of dealing
> with stat files, which sounds big duplicate efforts.

Thanks for the input! Yes, this is a trade-off between putting
responsibility on the
extension vs core. The initial thought I had was exactly like yours, but it will
be easier to get something pushed if we make the core changes as minimal as
possible. If there are enough complaints in the future, this can be revisited.
Particularly if there is a common patterns for file cleanup, this
could be turned
into a core utility.

> That looks like a good direction to me.  The only differences I can
> see with the stats module in injection_points for variable-sized stats
> is that this new module does not check pgstat_drop_entry() and
> pgstat_fetch_entry() when working on a custom stats kind.  If we had
> SQL interfaces calling these two, we could just remove
> injection_stats.c entirely, moving everything to this new test module.

> I should have invented a new module from the start, perhaps, but well,
> that was good enough to check the basic APIs when working on the
> custom APIs.  Removing this duplication would be my own business with
> your module in the tree, no need for you to worry about that.  That
> would also remove the tweak you have used regarding the duplicated
> kind ID.

I plan on addressing the other comments.

However, as discussed off-list, I do think moving the custom kind tests from
injection points to the new test module is a prerequisite. I rather
not  have us
push a new test module that is doing duplicate work as the injection
stats tests.
I worked on this refactoring today and plan to have a patch ready for review
by tomorrow.

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

05 декабря, 02:14:58

On Thu, Dec 04, 2025 at 05:00:10PM -0600, Sami Imseih wrote:
> Thanks for the input! Yes, this is a trade-off between putting
> responsibility on the
> extension vs core. The initial thought I had was exactly like yours, but it will
> be easier to get something pushed if we make the core changes as minimal as
> possible. If there are enough complaints in the future, this can be revisited.
> Particularly if there is a common patterns for file cleanup, this
> could be turned
> into a core utility.

Another way to shape it would be to have an in-core routine that
provides a default logic for the actions to take depending on the
write, read or discard state, with the state and a FILE* as arguments.
The main pgstats file would call that, modules may decide to use it.

> However, as discussed off-list, I do think moving the custom kind tests from
> injection points to the new test module is a prerequisite. I rather
> not have us push a new test module that is doing duplicate work as the injection
> stats tests.
> I worked on this refactoring today and plan to have a patch ready for review
> by tomorrow.

Cool, thanks!
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

06 декабря, 04:27:50

> > However, as discussed off-list, I do think moving the custom kind tests from
> > injection points to the new test module is a prerequisite. I rather
> > not have us push a new test module that is doing duplicate work as the injection
> > stats tests.
> > I worked on this refactoring today and plan to have a patch ready for review
> > by tomorrow.
>
> Cool, thanks!

Attached is the new test module that replaces the custom statistics
tests currently in the injection points tests. Under test_custom_stats, there
are two separate modules: one for variable-amount stats and one for
fixed-amount stats. With this, we can completely remove the
stats-related tests and supporting code under
src/test/modules/injection_points/.

A few notes on the tests:

1. Variable stats: pgstat_drop_entry() and pgstat_fetch_entry() are
exercised here, addressing an earlier point raised in the thread.

2. Fixed-amount stats: I added specific tests for reset behavior; both
during crash recovery and during manual resets.

3. In test_custom_fixed_stats.c, you will see this comment:
```
/* see explanation above PgStatShared_Archiver for the reset protocol */
LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
```
This is intentional, as the reset protocol is documented at the
referenced location [0]. I wanted to call that out for the patch review.

Once this gets pushed, it will simplify the remaining work needed
for the remaining serialization callbacks work.

[0] https://github.com/postgres/postgres/blob/master/src/include/utils/pgstat_internal.h#L362-L382

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

v3-0001-Move-custom-stats-tests-from-injection_points-to-.patch

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

07 декабря, 20:42:38

I took a look at the earlier comments.

> +    /* Read and verify the hash key */
> +    if (!pgstat_read_chunk(fd, (void *) key, sizeof(PgStat_HashKey)))
> +        return;
> [...]
> +    /* Write the hash key to identify this entry */
> +    pgstat_write_chunk(fd, (void *) key, sizeof(PgStat_HashKey));

> I am puzzled by this part of 0002.  Why are you overwriting the key
> once after loading it from the main pgstats file?

Yes, this is not necessary. I removed it.

> I would have done it in a slightly different way, I guess, with the data
> stored on disk in the main pgstats file including an offset to know
> where to search in the secondary file.

that's a much better approach and the pattern we would want to use
going forward. Since this does not require we read the entires
back in the same order as written, so it's much more flexible.
I not did this in the test module.

> Perhaps the callback in the module for end_extra_stats should use a
> switch based on PgStat_StatsFileOp.  Minor point.

Agree. Done.

>> * In the current path, pgstat performs its own write, then call
>> * callbacks. What about if a callback fails? Will that leave pgstat
>> * in a stale state?

> For the write state, end_extra_stats() would take care of that.  It
> depends on what kind of errors you would need to deal with, but as
> proposed the patch would offer the same level of protection for the
> writes of the stats, where we'd check for an error on the fd saved by
> an extension for an extra file.

> I think that you have a fair point about the stats read path though,
> shouldn't we make the callback from_serialized_extra_stats() return a
> state to be able to trigger a full-scale cleanup, at least?

In this case, even if the callback does not return a state, the cleanup
will eventually occur at the end of the read, see

```
done:
    /* First, cleanup the main stats file, PGSTAT_STAT_PERMANENT_FILENAME */
    FreeFile(fpin);

    elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
    unlink(statfile);

    /* Let each stats kind run its cleanup callback, if it provides one */
    for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
    {
        const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);

        if (kind_info && kind_info->end_extra_stats)
            kind_info->end_extra_stats(STATS_READ);
    }
```

However, this could also mean some entries could be read back correctly, and
others not, so maybe it's not such a good idea. So, I did what is suggested
and allow the callback to return a bool which will raise an error and trigger
the cleanup code.

> +     if (pgstat_is_kind_custom(key.kind) && kind_info->from_serialized_extra_stats)
> +         kind_info->from_serialized_extra_stats(&key, header, fpin);
> [...]
> +     if (pgstat_is_kind_custom(ps->key.kind) && kind_info->to_serialized_extra_stats)
> +         kind_info->to_serialized_extra_stats(&ps->key, shstats, fpout);

> These restrictions based on custom kinds do not seem mandatory.
> Why not allowing built-in kinds the same set of operations?

No good reason not to. In fact, maybe a follow-up will be to move the
replslot to this infrastructure and remove reliance on PGSTAT_FILE_ENTRY_NAME.

attached is the v4 patch set which includes:

0001 - which is just moving the tests out of injection points into a new
test module. This is similar to v3 [0].

0002 - Is the code changes to implement the callbacks and the necessary
tests in the new test module.

[0]
https://www.postgresql.org/message-id/CAA5RZ0sG2RUKg%3DOLY%2B6-e4q%3DX9rsLfK3pKn03d%3DRZQppEDR%3DBg%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

08 декабря, 09:35:23

On Fri, Dec 05, 2025 at 07:27:50PM -0600, Sami Imseih wrote:
> Attached is the new test module that replaces the custom statistics
> tests currently in the injection points tests. Under test_custom_stats, there
> are two separate modules: one for variable-amount stats and one for
> fixed-amount stats. With this, we can completely remove the
> stats-related tests and supporting code under
> src/test/modules/injection_points/.

Yes, thanks.  Structurally, this is better and more flexible than what
we had originally, and I have noticed that you have copied the
original files while adding more comments and renaming a bit things:
the structure of the functions was exactly the same.  Anyway, I have
worked on that for a good portion of the day, splitting the module
drop and the new module into two commits, and applied the result after
tweaking quite a few things in terms of names and comments (no
pgstat_*, a bit more "Var" and "Fixed", etc.), applying a much more
consistent set of names across the board for the functions and the
structures.  This cleanup part is moved out of the way now, so that
you ease the introduction of the next pieces you are proposing.

The tests for the reset of fixed-sized stats was a nice addition,
indeed.  If you have more areas that you think could be improved,
ideas are of course welcome.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

08 декабря, 22:09:14

> Yes, thanks.  Structurally, this is better and more flexible than what
> we had originally, and I have noticed that you have copied the
> original files while adding more comments and renaming a bit things:
> the structure of the functions was exactly the same.  Anyway, I have
> worked on that for a good portion of the day, splitting the module
> drop and the new module into two commits, and applied the result after
> tweaking quite a few things in terms of names and comments (no
> pgstat_*, a bit more "Var" and "Fixed", etc.), applying a much more
> consistent set of names across the board for the functions and the
> structures.  This cleanup part is moved out of the way now, so that
> you ease the introduction of the next pieces you are proposing.

Thanks for getting these committed!

I rebased the custom callbacks patch in v5.

One very minor thing from the earlier commits that I corrected here is
the test for entry 2 after a clean restart.

-is($result, "entry1|2", "variable-sized stats persist after clean restart");
+is($result, "entry1|2|Test entry 1", "variable-sized stats persist
after clean restart");
+
+$result = $node->safe_psql('postgres', q(select * from
test_custom_stats_var_report('entry2')));
+is($result, "entry2|3|Test entry 2", "variable-sized stats persist
after clean restart");
+

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

v5-0001-Allow-cumulative-statistics-to-serialize-auxiliar.patch

Re: [Proposal] Adding callback support for custom statistics kinds

От

Chao Li

Дата:

09 декабря, 06:38:53


> On Dec 9, 2025, at 03:09, Sami Imseih <samimseih@gmail.com> wrote:
>
>> Yes, thanks.  Structurally, this is better and more flexible than what
>> we had originally, and I have noticed that you have copied the
>> original files while adding more comments and renaming a bit things:
>> the structure of the functions was exactly the same.  Anyway, I have
>> worked on that for a good portion of the day, splitting the module
>> drop and the new module into two commits, and applied the result after
>> tweaking quite a few things in terms of names and comments (no
>> pgstat_*, a bit more "Var" and "Fixed", etc.), applying a much more
>> consistent set of names across the board for the functions and the
>> structures.  This cleanup part is moved out of the way now, so that
>> you ease the introduction of the next pieces you are proposing.
>
> Thanks for getting these committed!
>
> I rebased the custom callbacks patch in v5.
>
> One very minor thing from the earlier commits that I corrected here is
> the test for entry 2 after a clean restart.
>
> -is($result, "entry1|2", "variable-sized stats persist after clean restart");
> +is($result, "entry1|2|Test entry 1", "variable-sized stats persist
> after clean restart");
> +
> +$result = $node->safe_psql('postgres', q(select * from
> test_custom_stats_var_report('entry2')));
> +is($result, "entry2|3|Test entry 2", "variable-sized stats persist
> after clean restart");
> +
>
> --
> Sami Imseih
> Amazon Web Services (AWS)
> <v5-0001-Allow-cumulative-statistics-to-serialize-auxiliar.patch>

```
+                    if (kind_info->from_serialized_extra_stats)
+                    {
+                        if (!kind_info->from_serialized_extra_stats(&key, header, fpin))
+                        {
+                            elog(WARNING, "could not read extra stats for entry %u/%u/%" PRIu64,
+                                 key.kind, key.dboid, key.objid);
+                            goto error;
+                        }
+                    }
```

When deserialize failed, it goes to error. In the error clause, it calls pgstat_reset_after_failure(), so do we want to
giveextensions a chance to do some reset operations? If yes, then we can add a reset_after_failure() callback. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

09 декабря, 06:57:15

> +                                       if (kind_info->from_serialized_extra_stats)
> +                                       {
> +                                               if (!kind_info->from_serialized_extra_stats(&key, header, fpin))
> +                                               {
> +                                                       elog(WARNING, "could not read extra stats for entry %u/%u/%"
PRIu64,
> +                                                                key.kind, key.dboid, key.objid);
> +                                                       goto error;
> +                                               }
> +                                       }
> ```
>
> When deserialize failed, it goes to error. In the error clause, it calls pgstat_reset_after_failure(), so do we want
togive extensions a chance to do some reset operations? If yes, then we can add a reset_after_failure() callback.
 

The way v5 is dealing with a deserialize failure is that when
it goes to error, the pgstat_reset_after_failure() will reset the
stats for all kinds, since pgstat_drop_all_entries() is called
during that call. So there is nothing for an extension to have
to do on its own. The extension will then clean-up resources
at the end when  all the kinds are iterated over and
kind_info->end_extra_stats(STATS_READ) is called for each
kind.

Let me know if I'm still missing something?

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

09 декабря, 07:45:17

On Mon, Dec 08, 2025 at 09:57:15PM -0600, Sami Imseih wrote:
> The way v5 is dealing with a deserialize failure is that when
> it goes to error, the pgstat_reset_after_failure() will reset the
> stats for all kinds, since pgstat_drop_all_entries() is called
> during that call. So there is nothing for an extension to have
> to do on its own. The extension will then clean-up resources
> at the end when  all the kinds are iterated over and
> kind_info->end_extra_stats(STATS_READ) is called for each
> kind.
>
> Let me know if I'm still missing something?

It seems to me that you are missing nothing here, and that Chao has
missed the fact that the end of pgstat_read_statsfile() does a "goto
done", meaning that we would take a round of
end_extra_stats(STATS_READ) to do all the cleanup after resetting all
the stats.  That's what I would expect.

+static inline bool pgstat_check_extra_callbacks(PgStat_Kind kind);
[...]
@@ -645,6 +656,13 @@ pgstat_initialize(void)
+    /* Check a kind's extra-data callback setup */
+    for (PgStat_Kind kind = PGSTAT_KIND_BUILTIN_MIN; kind <= PGSTAT_KIND_BUILTIN_MAX; kind++)
+        if (!pgstat_check_extra_callbacks(kind))
+            ereport(ERROR,
+                    errmsg("incomplete extra serialization callbacks for stats kind %d",
+                           kind));

Why does this part need to run each time a backend initializes its
access to pgstats?  Shouldn't this happen only once when a stats kind
is registered?  pgstat_register_kind() should be the only code path
that does such sanity checks.

By the way, checking that to_serialized_extra_stats and
kind_info->from_serialized_extra_stats need to be both defined is
fine as these are coupled together, but I am not following the reason
why end_extra_stats would need to be included in the set?  For
example, a stats kind could decide to add some data to the main
pgstats file without creating extra files, hence they may not need to
define end_extra_stats.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Chao Li

Дата:

09 декабря, 08:23:56


> On Dec 9, 2025, at 12:45, Michael Paquier <michael@paquier.xyz> wrote:
>
> It seems to me that you are missing nothing here, and that Chao has
> missed the fact that the end of pgstat_read_statsfile() does a "goto
> done", meaning that we would take a round of

No, I didn’t miss that part. But in the “done” clause:

```
done:
/* First, cleanup the main stats file, PGSTAT_STAT_PERMANENT_FILENAME */
FreeFile(fpin);

elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
unlink(statfile);

/* Let each stats kind run its cleanup callback, if it provides one */
for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
{
const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);

if (kind_info && kind_info->end_extra_stats)
kind_info->end_extra_stats(STATS_READ);
}
```

end_extra_stats(STATS_READ) has no failure indication.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [Proposal] Adding callback support for custom statistics kinds

От

Chao Li

Дата:

09 декабря, 08:35:47


> On Dec 9, 2025, at 13:23, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On Dec 9, 2025, at 12:45, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> It seems to me that you are missing nothing here, and that Chao has
>> missed the fact that the end of pgstat_read_statsfile() does a "goto
>> done", meaning that we would take a round of
>
> No, I didn’t miss that part. But in the “done” clause:
>
> ```
> done:
> /* First, cleanup the main stats file, PGSTAT_STAT_PERMANENT_FILENAME */
> FreeFile(fpin);
>
> elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
> unlink(statfile);
>
> /* Let each stats kind run its cleanup callback, if it provides one */
> for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
> {
> const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
>
> if (kind_info && kind_info->end_extra_stats)
> kind_info->end_extra_stats(STATS_READ);
> }
> ```
>
> end_extra_stats(STATS_READ) has no failure indication.
>

Sorry, I incidentally clicked “send” too quickly.

My point is that, there are many places jumping to “error”, then from “error” goto “done”, if an error didn’t happen
fromthe deserialize callback, how end_extra_stats() can know if failure happened and takes action accordingly? 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

10 декабря, 00:54:58

> My point is that, there are many places jumping to “error”, then from “error” goto “done”,
> if an error didn’t happen from the deserialize callback, how end_extra_stats()
> can know if failure happened and takes action accordingly?

IIUC, if *any* error occurs outside of a deserialize callback, first the "error"
code will be called, followed by "done" which will then trigger the
end_extra_stats
callback that will perform the cleanup.

Attached is v6 with a few minor indentation fixes and a correction to
freeing the file in the cleanup callback.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

v6-0001-Allow-cumulative-statistics-to-serialize-auxiliar.patch

Re: [Proposal] Adding callback support for custom statistics kinds

От

Chao Li

Дата:

10 декабря, 02:05:42

> On Dec 10, 2025, at 05:54, Sami Imseih <samimseih@gmail.com> wrote:
>
> IIUC, if *any* error occurs outside of a deserialize callback, first the "error"
> code will be called, followed by "done" which will then trigger the
> end_extra_stats
> callback that will perform the cleanup.

That is true. But problem is, without an error indication, end_extra_stats(STATS_READ) can only blindly perform cleanup
works.As you are providing general purposed callbacks, who knows what scenarios extensions would do, so it’s better to
providemore information to callbacks. IMO, letting end_extra_stats() know current situation (normal or failure, even
errorcode) is very meaningful. For example, my extension may want to log “I am forced to quite due to outside error” or
“Iam done successfully” in end_extra_stats(). Anyway, that’s my own opinion. If you and Michael still consider that’s
nota problem, I won’t argue more. 

Best reagards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

10 декабря, 02:15:45

> IIUC, if *any* error occurs outside of a deserialize callback, first the "error"
> code will be called, followed by "done" which will then trigger the
> end_extra_stats
> callback that will perform the cleanup.

That is true. But problem is, without an error indication, end_extra_stats(STATS_READ) can only blindly perform cleanup works. As you are providing general purposed callbacks, who knows what scenarios extensions would do, so it’s better to provide more information to callbacks. IMO, letting end_extra_stats() know current situation (normal or failure, even error code) is very meaningful. For example, my extension may want to log “I am forced to quite due to outside error” or “I am done successfully” in end_extra_stats(). Anyway, that’s my own opinion. If you and Michael still consider that’s not a problem, I won’t argue more.

Thanks for explaining. If there is a good use-case to add more detail to the “end” callback, it’s not very obvious yet. Maybe in the future, there will be a convincing reason to do so.

When we hit the clean-up code on any “error”, it should be accompanied by an error log. That is

done in all cases inside pgstat.c, and I expect an extension to log the error as well.

Sami Imseih

Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

10 декабря, 11:04:16

On Tue, Dec 09, 2025 at 05:15:45PM -0600, Sami Imseih wrote:
> Thanks for explaining. If there is a good use-case to add more detail to
> the “end” callback, it’s not very obvious yet. Maybe in the future, there
> will be a convincing reason to do so.

The step taken by test_custom_var_stats_file_cleanup() for the
STATS_READ case shines for its simplicity.  The STATS_DISCARD case is
also simple: we know that we want to ditch the stats.

Now, it is kind of true that the STATS_WRITE case feels a bit
disturbing written this way: we let a module take an action, but we
don't actually know the state of the main pgstats file when inside the
callback.  I mean, you can know how things are going on, but it means
that a module can just rely on a check if
PGSTAT_STAT_PERMANENT_FILENAME is on disk, but an unlink() could have
failed as well.  So, yes, I am wondering whether we should do what
Chao is suggesting, passing an extra state to the callback to let the
module know if we have actually succeeded or failed the operations
that have been taken on the main stats file before the callback
end_extra_stats is called in the three cases.  It does not matter for
the STATS_READ case, but it may matter for the STATS_DISCARD or
STATS_WRITE case.

> When we hit the clean-up code on any “error”, it should be accompanied by
> an error log. That is
> done in all cases inside pgstat.c, and I expect an extension to log the
> error as well.

FWIW, I still have the same question as the one posted here about the
business in pgstat_initialize(), still present in v6:
https://www.postgresql.org/message-id/aTepXZ97PsXpuywI@paquier.xyz

This remains unanswered.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

10 декабря, 21:36:36

> Now, it is kind of true that the STATS_WRITE case feels a bit
> disturbing written this way: we let a module take an action, but we
> don't actually know the state of the main pgstats file when inside the
> callback.  I mean, you can know how things are going on, but it means
> that a module can just rely on a check if
> PGSTAT_STAT_PERMANENT_FILENAME is on disk, but an unlink() could have
> failed as well.  So, yes, I am wondering whether we should do what
> Chao is suggesting, passing an extra state to the callback to let the
> module know if we have actually succeeded or failed the operations
> that have been taken on the main stats file before the callback
> end_extra_stats is called in the three cases.  It does not matter for
> the STATS_READ case, but it may matter for the STATS_DISCARD or
> STATS_WRITE case.

I am having a hard time being convinced that this extra status is needed.
I am not expecting an extension to operate on the main stats file inside
the end_extra_stats callback, and even if some operation failed on the
main stats file, the cleanup callback will need to take the steps to
perform the cleanup on its own resources.

Is there a concrete example?

> FWIW, I still have the same question as the one posted here about the
> business in pgstat_initialize(), still present in v6:
> https://www.postgresql.org/message-id/aTepXZ97PsXpuywI@paquier.xyz
>
> This remains unanswered.

Responding to the questions from the thread above.

> Why does this part need to run each time a backend initializes its
> access to pgstats?

Good point. This is unnecessary. This validation should really be
done inside StatsShmemInit by postmaster.

> By the way, checking that to_serialized_extra_stats and
> kind_info->from_serialized_extra_stats need to be both defined is
> fine as these are coupled together, but I am not following the reason
> why end_extra_stats would need to be included in the set? For
> example, a stats kind could decide to add some data to the main
> pgstats file without creating extra files, hence they may not need to
> define end_extra_stats.

.. and after giving this more thought, I actually don't think we should
do any validation for any of the callbacks. If an extension is writing
to any file ( core or custom ), naturally they will want to read it back.
Now I am not sure what these validations are protecting us against.
Also, maybe the extension wants to just read data from the main stats
file, I could see that use-case, perhaps.

So, I am proposing removing the validation altogether. What do
you think?

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

12 декабря, 05:58:54

On Wed, Dec 10, 2025 at 12:36:36PM -0600, Sami Imseih wrote:
> .. and after giving this more thought, I actually don't think we should
> do any validation for any of the callbacks. If an extension is writing
> to any file ( core or custom ), naturally they will want to read it back.
> Now I am not sure what these validations are protecting us against.
> Also, maybe the extension wants to just read data from the main stats
> file, I could see that use-case, perhaps.
>
> So, I am proposing removing the validation altogether. What do
> you think?

The to and from callbacks are coupled with each other, so there may be
a point in making sure that if one is defined so is the other.  Now, I
have never done any enforcement for the existing from/to serialization
callbacks either because it would be quickly clear for one what needs
to be done when implementing a custom kind.  So I'd agree with just
removing these checks and keep the code simpler.

FWIW, I have begun putting my hands on your patch, editing it at some
degree.  I am not sure that I will be able to finish that today, but
I'm working towards getting something done.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

12 декабря, 12:39:49

On Fri, Dec 12, 2025 at 11:58:54AM +0900, Michael Paquier wrote:
> FWIW, I have begun putting my hands on your patch, editing it at some
> degree.  I am not sure that I will be able to finish that today, but
> I'm working towards getting something done.

Well, I have been able to do enough progress to have something to
share, and I'm getting pretty happy about how things are shaping.  As
you will notice, I have edited quite a few things..  In details:
- Less fwrite() and fread(), more read_chunk() and write_chunk().  We
are exposing these APIs, let's use them.
- Comments, much more comments and documentation.
- The callbacks are renamed, to be more generic: "finish" for the
end-of-operation actions and to/from_serialized_data.
- The format of the extra data in the main pgstats file and the
secondary file was a bit strange.  Mainly, why adding the length to
the main file and not the secondary file?  I have extended that a
little bit:
-- Addition of a magic number in the main file, to provide an extra
layer of safety in the read callback, letting the callback know that
it needs to read some data.
-- The offset of the secondary file follows immeditely.
-- The secondary file includes at the offset a copy of the hash key,
the description length, and the description.
- Reorganization of the read/write flow for the callbacks in the
modules, tracking the offset at write more precisely.  The handling of
the empty descriptions becomes simpler than what you have proposed
previously.

This way, we can make sure that the main stats file is OK with the
magic number, and we have a sanity check in the secondary file based
on the hash key whose copy is in the main stats file.

Regarding the error state that could be sent to the "end" callback, I
think that you are right.  We are not gaining much with that as by
design we are already pretty loose on the write side, hoping for the
best, relying on the read side to enforce all sanity checks.  So a
status in the "from" callback sounds like a good enough balance.

At the end of the day, I'm feeling pretty much OK with the core
changes and the layer we have here.  The module changes need an extra
round of lookup (did as well some tests with corrupted and empty
secondary files to test the stability at reload), and I'm pretty tired
so I may have missed something there.  The patch needs to be split in
two parts: one for the backend changes, and one for the module itself.
The backend changes are feeling pretty good, the module changes feel
better.
--
Michael

Вложения

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

13 декабря, 03:41:20

Thanks for the updates!

> - Less fwrite() and fread(), more read_chunk() and write_chunk().  We
> are exposing these APIs, let's use them.

oops. That totally slipped my mind :( sorry about that.

> - The callbacks are renamed, to be more generic: "finish" for the
> end-of-operation actions and to/from_serialized_data.

At first I wasn’t a fan of the name “finish” for the callback.
I was thinking of calling it “finish_auxiliary”. But, we’re not
forcing callbacks to be used together, and there could perhaps
be cases where “finish" can be used on its own, so this is fine by me.

I made some changes as well, in v8:

1/ looks like b4cbc106a6ce snuck into v7. I fixed that.

2/ After looking this over, I realized that “extra” and “auxiliary”
were being used interchangeably. To avoid confusion, I replaced all
instances of “extra” with “auxiliary" in both the comments and
macros, i.e. TEST_CUSTOM_AUX_DATA_DESC

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

v8-0001-Allow-cumulative-statistics-to-serialize-auxiliar.patch

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

13 декабря, 05:00:30

On Fri, Dec 12, 2025 at 06:41:20PM -0600, Sami Imseih wrote:
> I made some changes as well, in v8:
>
> 1/ looks like b4cbc106a6ce snuck into v7. I fixed that.

Oops, sorry about that.  I went one reset too deep..  I can see that
my local branch was also wrong, an a rebase fixed it immediately.

> 2/ After looking this over, I realized that “extra” and “auxiliary”
> were being used interchangeably. To avoid confusion, I replaced all
> instances of “extra” with “auxiliary" in both the comments and
> macros, i.e. TEST_CUSTOM_AUX_DATA_DESC

I can see what you have changed in v8 compared to v7, in terms of the
elog(), the comments and TEST_CUSTOM_AUX_DATA_DESC.  That works for
me.  If somebody has a better idea for a name, these can always be
tweaked at will.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

14 декабря, 03:33:41

I just remembered that we should document the new callbacks in [0] with a

brief explanation of their purpose and a reference to test_custom_stats

as an example of usage. What do you think?

[0]

https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-CUSTOM-CUMULATIVE-STATISTICS

Sami Imseih

Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

14 декабря, 04:55:04

> I just remembered that we should document the new callbacks in [0] with a
> brief explanation of their purpose and a reference to test_custom_stats
> as an example of usage. What do you think?

oh, and I also realized that the documentation was updated incorrectly when
test_custom_stats was originally committed. Thought it was better to fix this
in a separate thread [0].

[0] https://www.postgresql.org/message-id/CAA5RZ0s4heX926+ZNh63u12gLd9jgauU6yiirKc7xGo1G01PXQ@mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

15 декабря, 03:55:43

On Sat, Dec 13, 2025 at 06:33:41PM -0600, Sami Imseih wrote:
> I just remembered that we should document the new callbacks in [0] with a
> brief explanation of their purpose and a reference to test_custom_stats
> as an example of usage. What do you think?

I'd rather keep the documentation simpler, pointing only to the
code templates we have and pgstat_internal.h.  One reason is that code
in the documentation tends to rot very easily, particularly when
applied to plugin APIs.  If you think that some of the callbacks of
pgstat_internal.h deserve more documentation or explanation, let's do
that directly in the header.

Saying that, I have tweaked a bit more the patch this morning and
applied the result after splitting things in two: one for the core
backend changes and one for the tests of the new APIs.  Some comments
and error strings have been simplified and I have noticed some more
inconsistencies after a follow-up read.

Another thing that I did not like is the use of "long" for the offset,
which is not portable.  We have a drop-in portable replacement for
seeks and offsets: fseeko() and pgoff_t.  That was in the test code,
but still let's keep things more portable in the long run without a
4-byte limitation on WIN32.

I guess that we are done for this thread then.
--
Michael

Вложения

signature.asc

Re: [Proposal] Adding callback support for custom statistics kinds

От

Peter Eisentraut

Дата:

17 декабря, 10:03:36

On 13.12.25 01:41, Sami Imseih wrote:
> Thanks for the updates!
> 
>> - Less fwrite() and fread(), more read_chunk() and write_chunk().  We
>> are exposing these APIs, let's use them.
> 
> oops. That totally slipped my mind :( sorry about that.
> 
>> - The callbacks are renamed, to be more generic: "finish" for the
>> end-of-operation actions and to/from_serialized_data.
> 
> At first I wasn’t a fan of the name “finish” for the callback.
> I was thinking of calling it “finish_auxiliary”. But, we’re not
> forcing callbacks to be used together, and there could perhaps
> be cases where “finish" can be used on its own, so this is fine by me.
> 
> I made some changes as well, in v8:
> 
> 1/ looks like b4cbc106a6ce snuck into v7. I fixed that.
> 
> 2/ After looking this over, I realized that “extra” and “auxiliary”
> were being used interchangeably. To avoid confusion, I replaced all
> instances of “extra” with “auxiliary" in both the comments and
> macros, i.e. TEST_CUSTOM_AUX_DATA_DESC

The function test_custom_stats_var_from_serialized_data() takes an 
argument of type

     const PgStatShared_Common *header

which is then later cast

     entry = (PgStatShared_CustomVarEntry *) header;

where entry is defined as

     PgStatShared_CustomVarEntry *entry;

So you are losing the const qualification here.

But fixing that by adding the const qualification to entry would not 
work because what entry points to is later modified:

     entry->description = InvalidDsaPointer;

So the header argument of the function should not be const qualified.

But the signature of that function is apparently determined by this new 
callbacks API, so it cannot be changed in isolation.

So it seems to me that either the callbacks API needs some adjustments, 
or this particular implementation of the callback function is incorrect.

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

17 декабря, 11:11:19

On Wed, Dec 17, 2025 at 08:03:36AM +0100, Peter Eisentraut wrote:
> So it seems to me that either the callbacks API needs some adjustments, or
> this particular implementation of the callback function is incorrect.

Hmm, you are right that this is not aligned.  This can be improved
with one change for each callback:
- It is OK with from_serialized_data() to manipulate the header data,
because we want to fill a portion of the shmem data with extra data
read from disk (the module wants to add a reference to a DSA stored in
the shmem entry, read from the second file).  So we should discard the
const marker from the callback definition.
- The const usage is OK for to_serialized_data(): it is better to
encourage a policy where the header data cannot be manipulated.  So
the const needs to be kept in the definition, but I also think that we
should change the module implementation so as the cast to
PgStatShared_CustomVarEntry is a const.

These changes result in the attached.  Sami, what do you think?
--
Michael

Вложения

Re: [Proposal] Adding callback support for custom statistics kinds

От

Sami Imseih

Дата:

17 декабря, 20:01:01

> On Wed, Dec 17, 2025 at 08:03:36AM +0100, Peter Eisentraut wrote:
> > So it seems to me that either the callbacks API needs some adjustments, or
> > this particular implementation of the callback function is incorrect.
>
> Hmm, you are right that this is not aligned.  This can be improved
> with one change for each callback:
> - It is OK with from_serialized_data() to manipulate the header data,
> because we want to fill a portion of the shmem data with extra data
> read from disk (the module wants to add a reference to a DSA stored in
> the shmem entry, read from the second file).  So we should discard the
> const marker from the callback definition.
> - The const usage is OK for to_serialized_data(): it is better to
> encourage a policy where the header data cannot be manipulated.  So
> the const needs to be kept in the definition, but I also think that we
> should change the module implementation so as the cast to
> PgStatShared_CustomVarEntry is a const.
>
> These changes result in the attached.  Sami, what do you think?

I agree. This was a miss during the review. Thanks for raising this.

The fix looks correct to me in which the from_serialized_data callback
is expected to modify the header, to reconstruct the entry and the
to_serialized_data is never expected to modify the header, since we
are only reading what is currently in stats. I can't think of a reason to
ever have to modify the entry while writing out to disk.

I got the attached patch ready with some additional comments in
the callback definitions to clarify the API contract. We only need
to call out the "header' nuance since it's a const in one callback
and not the other. "key" is self documenting being a const in both
cases.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

v1-0001-Fix-const-correctness-in-pgstat-serialization-cal.patch

Re: [Proposal] Adding callback support for custom statistics kinds

От

Michael Paquier

Дата:

18 декабря, 01:42:00

On Wed, Dec 17, 2025 at 11:01:01AM -0600, Sami Imseih wrote:
> I got the attached patch ready with some additional comments in
> the callback definitions to clarify the API contract. We only need
> to call out the "header' nuance since it's a const in one callback
> and not the other. "key" is self documenting being a const in both
> cases.

The comment sounds like a good idea, so included it and applied the
result.  Thanks for looking!
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [Proposal] Adding callback support for custom statistics kinds

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения