Обсуждение: Add wal_fpi_bytes_[un]compressed to pg_stat_wal
Hi hackers, I am proposing a patch that adds wal_fpi_bytes_[un]compressed columns to pg_stat_wal. These columns help us calculate WAL FPI (full page image) compression rates, confirm the usefulness of wal_compression and determine which compression algorithms are most effective. Currently, we must use cumbersome methods to compute the WAL compression rate: 1. Run the same benchmark twice (once with wal_compression enabled and one disabled) and compare the wal_bytes values in pg_stat_wal. However, this value reflects the total WAL reduction, not just the reduction from full page images. (pg_waldump --stats can provide similar data but it also requires direct access to WAL files and must be run on the server.) 2. Run pg_waldump --fullpage and compare the reported compressed size against the calculated uncompressed size (e.g. 8192 bytes - hole_length). This computation is inconvenient and also requires WAL access on the server. With these patches applied, we can easily compute the FPI compression rate with the following SQL: =# SELECT wal_fpi_bytes_compressed / wal_fpi_bytes_uncompressed * 100 AS wal_compression_rate FROM pg_stat_wal; wal_compression_rate ------------------------- 34.07161865906799706100 (1 row) The 0001 patch adds these columns to pg_stat_wal. The 0002 and 0003 patches add this information to EXPLAIN (WAL) and pg_stat_statements, respectively. I don't think these additions (0002 and 0003) are mandatory, so I suggest we focus the discussion on the 0001 patch first. Thoughts? -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Wed, Oct 22, 2025 at 05:04:58PM +0900, Shinya Kato wrote: > The 0001 patch adds these columns to pg_stat_wal. The 0002 and 0003 > patches add this information to EXPLAIN (WAL) and pg_stat_statements, > respectively. I don't think these additions (0002 and 0003) are > mandatory, so I suggest we focus the discussion on the 0001 patch > first. We already know the number of FPIs generated. Hence my take would be to use only one counter, not two: an aggregated FPI length like in xlogstats.h as exposed in pg_walinspect. That should be enough to offer trends regarding the effects of compression, even if some pages have holes that are discarded. > Thoughts? I would suggest to leave PGSS out of it for now. We really need to do something about the number of fields computed, with more GUCs to disable groups of them, at least, like JIT or the planning parts. No objections for the EXPLAIN and pg_stat_wal parts. The patch can be simpler. There is no need to pass the calculated number(s) across multiple functions in the stack, you can just aggregate the numbers in pgWalUsage directly in XLogRecordAssemble(). The only extra thing to do is that one needs to set pgstat_report_fixed to true to force the report to pgstats. -- Michael
Вложения
On Wed, Oct 22, 2025 at 5:45 PM Michael Paquier <michael@paquier.xyz> wrote: > We already know the number of FPIs generated. Hence my take would be > to use only one counter, not two: an aggregated FPI length like in > xlogstats.h as exposed in pg_walinspect. That should be enough to > offer trends regarding the effects of compression, even if some pages > have holes that are discarded. Yeah, I would like to know the trends of FPI compression rates, not the exact FPI compression rates. So, I agree with Michael, and have updated the patches. > I would suggest to leave PGSS out of it for now. We really need to do > something about the number of fields computed, with more GUCs to > disable groups of them, at least, like JIT or the planning parts. No > objections for the EXPLAIN and pg_stat_wal parts. Okay, since I'm not strongly attached to this idea, I've removed the 0003 patch for now. > The patch can be simpler. There is no need to pass the calculated > number(s) across multiple functions in the stack, you can just > aggregate the numbers in pgWalUsage directly in XLogRecordAssemble(). > The only extra thing to do is that one needs to set > pgstat_report_fixed to true to force the report to pgstats. Thank you for your review. I've implemented this suggestion in the v2 patches. -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Thu, Oct 23, 2025 at 06:36:01PM +0900, Shinya Kato wrote: > Okay, since I'm not strongly attached to this idea, I've removed the > 0003 patch for now. The fact that we cannot access this information without a pg_waldump or a pg_walinspect, which may not be available, and can be expensive, is a deal-breaker for me.. Or we may not have a direct access to the WAL segments. Without the changes in instrument.c from patch 0002, patch 0001 that implements the basics would not work. So.. I have moved the changes of instrument.c to 0001, reordered the fields to be more consistent, did two bumps (catalog, stats file), simplified the docs, then applied the result. By the way, Kato-san, what do you think about the attached extra simplification? With the FPIs counted in bytes, I don't see much a point in passing around the number of FPIs generated from XLogRecordAssemble() to XLogInsertRecord() . -- Michael
Вложения
On Tue, Oct 28, 2025 at 4:32 PM Michael Paquier <michael@paquier.xyz> wrote: > Without the changes in instrument.c from patch 0002, patch 0001 that > implements the basics would not work. So.. I have moved the changes > of instrument.c to 0001, reordered the fields to be more consistent, > did two bumps (catalog, stats file), simplified the docs, then applied > the result. Sorry for the inconvenience, and thank you for committing. I have revised patch 0002, which adds wal_fpi_bytes to EXPLAIN (WAL). > By the way, Kato-san, what do you think about the attached extra > simplification? With the FPIs counted in bytes, I don't see much a > point in passing around the number of FPIs generated from > XLogRecordAssemble() to XLogInsertRecord() . I investigated previous discussions and found [0]. This thread mentioned that XLogInsert() calls XLogRecordAssemble() multiple times in its do-while loop, so the value might be invalid. Based on the discussion above, it seems my previous patch also has the same issue. [0] https://www.postgresql.org/message-id/20200329121944.GA79261%40nol -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Tue, Oct 28, 2025 at 07:33:00PM +0900, Shinya Kato wrote: > I investigated previous discussions and found [0]. This thread > mentioned that XLogInsert() calls XLogRecordAssemble() multiple times > in its do-while loop, so the value might be invalid. > > Based on the discussion above, it seems my previous patch also has the > same issue. > > [0] https://www.postgresql.org/message-id/20200329121944.GA79261%40nol Dammit, you are right. I didn't see through this one. Even on HEAD it is true that we may trigger an early exit of XLogInsertRecord() and call XLogRecordAssemble() multiple times with a different FPI setup. So we cannot do direct manipulations of pgWalUsage in XLogRecordAssemble(), we must do these once the record is inserted. I'll clean up that tomorrow, which can be summarized as something like the attached (quick fix, need to double-check). -- Michael
Вложения
On Tue, Oct 28, 2025 at 10:04:20PM +0900, Michael Paquier wrote: > I'll clean up that tomorrow, which can be summarized as something like > the attached (quick fix, need to double-check). Done this cleanup as d3111cb753e8, and tweaked a bit the second patch (order of the fields and docs) before applying it as 5ab0b6a24807. On top of what has been already done here, should we also update the logs generated by do_analyze_rel() and heap_vacuum_rel()? -- Michael
Вложения
On Thu, Oct 30, 2025 at 3:57 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Oct 28, 2025 at 10:04:20PM +0900, Michael Paquier wrote: > > I'll clean up that tomorrow, which can be summarized as something like > > the attached (quick fix, need to double-check). > > Done this cleanup as d3111cb753e8, and tweaked a bit the second patch > (order of the fields and docs) before applying it as 5ab0b6a24807. Thank you for the fix and committing. > On top of what has been already done here, should we also update the > logs generated by do_analyze_rel() and heap_vacuum_rel()? You're right, I missed them. I have created a patch that addresses the above. Similar to EXPLAIN, I wanted to use 'fpi' and 'fpi bytes', but the VACUUM/ANALYZE logs showed 'full pages images', so I used 'full page image bytes'. postgres=# vacuum (verbose) pg_class; ~snip~ WAL usage: 1 records, 1 full page images, 7935 bytes, 7816 full page image bytes, 0 buffers full postgres=# analyze (verbose) pg_class; ~snip~ WAL usage: 92 records, 6 full page images, 49416 bytes, 37488 full page image bytes, 0 buffers full -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Fri, Oct 31, 2025 at 11:15:03AM +0900, Shinya Kato wrote: > You're right, I missed them. I have created a patch that addresses the > above. Similar to EXPLAIN, I wanted to use 'fpi' and 'fpi bytes', but > the VACUUM/ANALYZE logs showed 'full pages images', so I used 'full > page image bytes'. I am cool with "full page image bytes". Applied. -- Michael