Обсуждение: [Proposal] Adding callback support for custom statistics kinds
Hi, I'd like to propose $SUBJECT to serialize additional per-entry data beyond the standard statistics entries. Currently, custom statistics kinds can store their standard entry data in the main "pgstat.stat" file, but there is no mechanism for extensions to persist extra data stored in the entry. A common use case is extensions that register a custom kind and, besides standard counters, need to track variable-length data stored in a dsa_pointer. This proposal adds optional "to_serialized_extra" and "from_serialized_extra" callbacks to "PgStat_KindInfo" that allow custom kinds to write and read from extra data in a separate files (pgstat.<kind>.stat). The callbacks give extensions direct access to the file pointer so they can read and write data in any format, while the core "pgstat" infrastructure manages opening, closing, renaming, and cleanup, just as it does with "pgstat.stat". A concrete use case is pg_stat_statements. If it were to use custom stats kinds to track statement counters, it could also track query text stored in DSA. The callbacks allow saving the query text referenced by the dsa_pointer and restoring it after a clean shutdown. Since DSA (and more specifically DSM) cannot be attached by the postmaster, an extension cannot use "on_shmem_exit" or "shmem_startup_hook" to serialize or restore this data. This is why pgstat handles serialization during checkpointer shutdown and startup, allowing a single backend to manage it safely. I considered adding hooks to the existing pgstat code paths (pgstat_before_server_shutdown, pgstat_discard_stats, and pgstat_restore_stats), but that felt too unrestricted. Using per-kind callbacks provides more control. There are already "to_serialized_name" and "from_serialized_name" callbacks used to store and read entries by "name" instead of "PgStat_HashKey", currently used by replication slot stats. Those remain unchanged, as they serve a separate purpose. Other design points: 1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID. This avoids requiring extensions to provide names and prevents issues with spaces or special characters. 2. Both callbacks must be registered together. Serializing without deserializing would leave orphaned files behind, and I cannot think of a reason to allow this. 3. "write_chunk", "read_chunk", "write_chunk_s", and "read_chunk_s" are renamed to "pgstat_write_chunk", etc., and moved to "pgstat_internal.h" so extensions can use them without re-implementing these functions. 4. These callbacks are valid only for custom, variable-numbered statistics kinds. Custom fixed kinds may not benefit, but could be considered in the future. Attached 0001 is the proposed change, still in POC form. The second patch contains tests in "injection_points" to demonstrate this proposal, and is not necessarily intended for commit. Looking forward to your feedback! -- Sami Imseih Amazon Web Services (AWS)
Вложения
On Wed, Oct 22, 2025 at 03:24:11PM -0500, Sami Imseih wrote: > I'd like to propose $SUBJECT to serialize additional per-entry data beyond > the standard statistics entries. Currently, custom statistics kinds can store > their standard entry data in the main "pgstat.stat" file, but there is no > mechanism for extensions to persist extra data stored in the entry. A common > use case is extensions that register a custom kind and, besides > standard counters, > need to track variable-length data stored in a dsa_pointer. Thanks for sending a proposal in this direction. > A concrete use case is pg_stat_statements. If it were to use custom > stats kinds to track statement counters, it could also track query text > stored in DSA. The callbacks allow saving the query text referenced by the > dsa_pointer and restoring it after a clean shutdown. Since DSA > (and more specifically DSM) cannot be attached by the postmaster, an > extension cannot use "on_shmem_exit" or "shmem_startup_hook" > to serialize or restore this data. This is why pgstat handles > serialization during checkpointer shutdown and startup, allowing a single > backend to manage it safely. Agreed that it would be better to split the query text in a file of its own and now bloat the "main" pgstats file with this data, A real risk is that many PGSS entries with a bunch of queries would cause the file to be just full of the PGSS contents. > I considered adding hooks to the existing pgstat code paths > (pgstat_before_server_shutdown, pgstat_discard_stats, and > pgstat_restore_stats), but that felt too unrestricted. Using per-kind > callbacks provides more control. Per-kind callbacks to control all that makes sense here. > There are already "to_serialized_name" and "from_serialized_name" > callbacks used to store and read entries by "name" instead of > "PgStat_HashKey", currently used by replication slot stats. Those > remain unchanged, as they serve a separate purpose. > > Other design points: > > 1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID. > This avoids requiring extensions to provide names and prevents issues > with spaces or special characters. Hmm. Is that really what we want here? This pretty says that one single custom kind would never be able use multiple files, ever. > 2. Both callbacks must be registered together. Serializing without > deserializing would leave orphaned files behind, and I cannot think of a > reason to allow this. Hmm. Okay. > 3. "write_chunk", "read_chunk", "write_chunk_s", and > "read_chunk_s" are renamed to "pgstat_write_chunk", etc., and > moved to "pgstat_internal.h" so extensions can use them without > re-implementing these functions. Exposing the write and read chunk APIs and renaming them sounds good here, designed as they are now with a FILE* defined by the caller. It's good to share these for consistency across custom and built-in stats kinds. > 4. These callbacks are valid only for custom, variable-numbered statistics > kinds. Custom fixed kinds may not benefit, but could be considered in the > future. Pushing custom data for fixed-sized stats may be interesting, though like you I am not sure what a good use-case would look like. So discarding this case for now sounds fine to me. > Attached 0001 is the proposed change, still in POC form. Hmm. I would like to propose something a bit more flexible, refactoring and reusing some of the existing callbacks, among the following lines: - Rather than introducing a second callback able to do more serialization work, let's expand a bit the responsibility of to_serialized_name and from_serialized_name to be able to work in a more extended way, renaming them to "to/from_serialized_entry", which are now limited to return a NameData with pgstat.c enforcing the data written to the pgstats to be of NAMEDATALEN. The idea would be to let the callbacks push some custom data where they want. - The to_serialized_name path of pgstat_write_statsfile() would then be changed as follows: -- push a PGSTAT_FILE_ENTRY_NAME -- Write the key write_chunk_s. -- Call the callback to push some custom per-entry data. -- Finish with the main chunk of data, of size pgstat_get_entry_len(). - The fd or FILE* of the "main" pgstats file should be added as argument of both routines (not mandatory, but we are likely going to need that if we want to add more "custom" data in the main pgstats file before writing or reading a chunk). For example, for a PGSS text file, we would likely write two fields to the main data file: an offset and a length to be able to retrieve a query string, from a secondary file. - FDs where the data is written while we are in the to/from serialize can be handled within the code paths specific to the stats kind code. The first time a serialized callback of a stats kind is called, the extra file(s) is(are) opened. This may come at the cost of one new callback: at the end of the read and writes of the stats data, we would need an extra look that's able to perform cleanup actions, which would be here to make sure that the fds opened for the extra files are closed when we are done. The close of each file is equivalent to the pgstat_close_file() done in the patch, except that we'd loop over a callback that would do the cleanup job once we are done reading or writing a file. One step that can be customized in this new "end" callback is if a stats kind may decide to unlink() a previous file, as we do for the main pgstats file, or keep one or more files around. That would be up to the extension developer. We should be able to reuse or rework reset_all_cb() with a status given to it, depending on if we are dealing with a failure or a success path. Currently, reset_all_cb() is only used in a failure path, the idea would be to extend it for the success case. > The second patch > contains tests in "injection_points" to demonstrate this proposal, and is not > necessarily intended for commit. Having coverage for these kinds of APIs is always good, IMO. We need coverage for extension code. -- Michael
Вложения
Thanks for the feedback! > > Other design points: > > > > 1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID. > > This avoids requiring extensions to provide names and prevents issues > > with spaces or special characters. > > Hmm. Is that really what we want here? This pretty says that one > single custom kind would never be able use multiple files, ever. Perhaps if someone wants to have separate files for each different types of data, we should be able to support multiple files. I think we can add an option for the number of files and they can then be named "pgstat.<kind>.1.stat", pgstat.<kind>.2.stat", etc. I rather avoid having the extension provide a set of files names. So as arguments to the callback, besides the main file pointer ( as you mention below), we also provide the list of custom file pointers. what do you think? > Hmm. I would like to propose something a bit more flexible, > refactoring and reusing some of the existing callbacks, among the > following lines: > - Rather than introducing a second callback able to do more > serialization work, let's expand a bit the responsibility of > to_serialized_name and from_serialized_name to be able to work in a > more extended way, renaming them to "to/from_serialized_entry", which Sure, we can go that route. > - The fd or FILE* of the "main" pgstats file should be added as > argument of both routines (not mandatory, but we are likely going to > need that if we want to add more "custom" data in the main pgstats > file before writing or reading a chunk). For example, for a PGSS text > file, we would likely write two fields to the main data file: an > offset and a length to be able to retrieve a query string, from a > secondary file. Yeah, that could be a good idea for pg_s_s, if we don't want to store the key alongside the query text. Make more sense. > - FDs where the data is written while we are in the to/from serialize > can be handled within the code paths specific to the stats kind code. > The first time a serialized callback of a stats kind is called, the > extra file(s) is(are) opened. This may come at the cost of one new > callback: at the end of the read and writes of the stats data, we > would need an extra look that's able to perform cleanup actions, which > would be here to make sure that the fds opened for the extra files are > closed when we are done. The close of each file is equivalent to the > pgstat_close_file() done in the patch, except that we'd loop over a > callback that would do the cleanup job once we are done reading or > writing a file. One step that can be customized in this new "end" > callback is if a stats kind may decide to unlink() a previous file, as > we do for the main pgstats file, or keep one or more files around. > That would be up to the extension developer. We should be able to > reuse or rework reset_all_cb() with a status given to it, depending on > if we are dealing with a failure or a success path. Currently, > reset_all_cb() is only used in a failure path, the idea would be to > extend it for the success case. I will provide a patch with the recommendations. -- Sami
On Thu, Oct 23, 2025 at 04:35:58PM -0500, Sami Imseih wrote: > Perhaps if someone wants to have separate files for each different > types of data, > we should be able to support multiple files. I think we can add an > option for the > number of files and they can then be named "pgstat.<kind>.1.stat", > pgstat.<kind>.2.stat", > etc. I rather avoid having the extension provide a set of files names. > So as arguments to the callback, besides the main file pointer ( as > you mention below), > we also provide the list of custom file pointers. > > what do you think? My worry here is the lack of flexibility regarding stats that could be split depending on the objects whose data needs to be flushed. For example, stats split across multiple databases (like our good-old pre-v14 pgstats, but on a per-kind basis). So I don't think that we can really assume that the list of file names should be fixed when we begin the read/write process of the main pgstats file. -- Michael
Вложения
> On Thu, Oct 23, 2025 at 04:35:58PM -0500, Sami Imseih wrote: > > Perhaps if someone wants to have separate files for each different > > types of data, > > we should be able to support multiple files. I think we can add an > > option for the > > number of files and they can then be named "pgstat.<kind>.1.stat", > > pgstat.<kind>.2.stat", > > etc. I rather avoid having the extension provide a set of files names. > > So as arguments to the callback, besides the main file pointer ( as > > you mention below), > > we also provide the list of custom file pointers. > > > > what do you think? > > My worry here is the lack of flexibility regarding stats that could be > split depending on the objects whose data needs to be flushed. For > example, stats split across multiple databases (like our good-old > pre-v14 pgstats, but on a per-kind basis). So I don't think that we > can really assume that the list of file names should be fixed when we > begin the read/write process of the main pgstats file. I was trying to avoid an extra field in PgStat_KindInfo if possible, but it's worthwhile to provide more flexibility to an extension. I will go with this. -- Sami Imseih Amazon Web Services (AWS)
On Thu, Oct 23, 2025 at 07:57:38PM -0500, Sami Imseih wrote: > I was trying to avoid an extra field in PgStat_KindInfo if possible, but > it's worthwhile to provide more flexibility to an extension. I will go > with this. Yes, I don't think that we will be able to avoid some refactoring of the existing callbacks. The introduction of a new one may not be completely necessary, though, especially if we reuse the reset callback to be called when the stats read and write finish to close any fds we may have opened when processing. Maintaining the state of the files opened within each stat kind code across multiple calls of the new "serialized" callback feels a bit more natural and more flexible, at least it's my take on the matter. -- Michael