Обсуждение: pg_waldump: support decoding of WAL inside tarfile
Hi All, Attaching patch to support a new feature that let pg_waldump decode WAL files directly from a tar archive. This was worked to address a limitation in pg_verifybackup[1], which couldn't parse WAL files from tar-formatted backups. The implementation will align with pg_waldump's existing xlogreader design, which uses three callback functions to manage WAL segments: open, read, and close. For tar archives, however, the approach will be simpler. Instead of using separate callbacks for opening and closing, the tar archive will be opened once at the start and closed explicitly at the end. The core logic will be in the WAL page reading callback. When xlogreader requests a new WAL page, this callback will be invoked. It will then call the archive streamer routine to read the WAL data from the tar archive into a buffer. This data will then be copied into xlogreader's own buffer, completing the read. Essentially, this is plumbing work: the new code will be responsible for getting WAL data from the tar archive and feeding it to the existing xlogreader. All other WAL page and record decoding logic, which is already robust within xlogreader, will be reused as is. This feature is being implemented in a series of patches as: - Refactoring: The first few patches (0001-0004) are dedicated to refactoring and minor code changes. - 005: This patch introduces the core functionality for pg_waldump to read WAL from a tar archive using the same archive streamer (fe_utils/astreamer.h) used in pg_verifybackup. This version requires WAL files in the archive to be in sequential order. - 006: This patch removes the sequential order restriction. If pg_waldump encounters an out-of-order WAL file, it writes the file to a temporary directory. The utility will then continue decoding and read from this temporary location later. - 007 and onwards: These patches will update pg_verifybackup to remove the restriction on WAL parsing for tar-formatted backups. 008 patch renames the "--wal-directory" switch to "--wal-path" to make it more generic, allowing it accepts a directory path or a tar archive path. ----------------------------------- Known Issues & Status: ----------------------------------- - Timeline Switching: The current implementation in patch 006 does not correctly handle timeline switching. This is a known issue, especially when a timeline change occurs on a WAL file that has been written to a temporary location. - Testing: Local regression tests on CentOS and macOS M4 are passing. However, some tests on macOS Sonoma (specifically 008_untar.pl and 010_client_untar.pl) are failing in the GitHub workflow with a "WAL parsing failed for timeline 1" error. This issue is currently being investigated. Please take a look at the attached patch and let me know your thoughts. This is an initial version, and I am making incremental improvements to address known issues and limitations. 1] https://git.postgresql.org/pg/commitdiff/8dfd3129027969fdd2d9d294220c867d2efd84aa -- Regards, Amul Sul EDB: http://www.enterprisedb.com
Вложения
- v1-0001-Refactor-pg_waldump-Move-some-declarations-to-new.patch
- v1-0002-Refactor-pg_waldump-Separate-logic-used-to-calcul.patch
- v1-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch
- v1-0004-pg_waldump-Rename-directory-creation-routine-for-.patch
- v1-0005-pg_waldump-Add-support-for-archived-WAL-decoding.patch
- v1-0006-WIP-pg_waldump-Remove-the-restriction-on-the-orde.patch
- v1-0007-pg_verifybackup-Delay-default-WAL-directory-prepa.patch
- v1-0008-pg_verifybackup-Rename-the-wal-directory-switch-t.patch
- v1-0009-pg_verifybackup-enabled-WAL-parsing-for-tar-forma.patch
On Thu, Aug 7, 2025 at 7:47 PM Amul Sul <sulamul@gmail.com> wrote: > [....] > ----------------------------------- > Known Issues & Status: > ----------------------------------- > - Timeline Switching: The current implementation in patch 006 does not > correctly handle timeline switching. This is a known issue, especially > when a timeline change occurs on a WAL file that has been written to a > temporary location. > This is still pending and will be addressed in the next version. Therefore, patch 0006 remains marked as WIP. > - Testing: Local regression tests on CentOS and macOS M4 are passing. > However, some tests on macOS Sonoma (specifically 008_untar.pl and > 010_client_untar.pl) are failing in the GitHub workflow with a "WAL > parsing failed for timeline 1" error. This issue is currently being > investigated. > This has been fixed in the attached version; all GitHub workflow tests are now fine. Regards, Amul
Вложения
- v2-0001-Refactor-pg_waldump-Move-some-declarations-to-new.patch
- v2-0002-Refactor-pg_waldump-Separate-logic-used-to-calcul.patch
- v2-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch
- v2-0004-pg_waldump-Rename-directory-creation-routine-for-.patch
- v2-0005-pg_waldump-Add-support-for-archived-WAL-decoding.patch
- v2-0006-WIP-pg_waldump-Remove-the-restriction-on-the-orde.patch
- v2-0007-pg_verifybackup-Delay-default-WAL-directory-prepa.patch
- v2-0008-pg_verifybackup-Rename-the-wal-directory-switch-t.patch
- v2-0009-pg_verifybackup-enabled-WAL-parsing-for-tar-forma.patch
On Mon, Aug 25, 2025 at 5:58 PM Amul Sul <sulamul@gmail.com> wrote: > > On Thu, Aug 7, 2025 at 7:47 PM Amul Sul <sulamul@gmail.com> wrote: > > [....] > > ----------------------------------- > > Known Issues & Status: > > ----------------------------------- > > - Timeline Switching: The current implementation in patch 006 does not > > correctly handle timeline switching. This is a known issue, especially > > when a timeline change occurs on a WAL file that has been written to a > > temporary location. > > > > This is still pending and will be addressed in the next version. > Therefore, patch 0006 remains marked as WIP. > After testing pg_waldump, I have realised that my previous understanding of its timeline handling was incorrect. I had mistakenly assumed by reading xlogreader code that it would use the same timeline-switching logic found in xlogreader, without first verifying this behavior. In testing, I found that pg_waldump does not follow timeline switches. Instead, it expects all WAL files to be from a single timeline, which is either specified by the user or determined from the starting segment or default 1. This is a positive finding, as it means we don't need to make significant changes to align pg_waldump's current behavior. The attached patches are now complete and no longer works in progress -- read for review. Additionally, I've dropped patch v2-0004 because it is no longer necessary. The primary patches that implement the proposed feature are now 0004 and 0005 in the attached set. Regards, Amul
Вложения
- v3-0001-Refactor-pg_waldump-Move-some-declarations-to-new.patch
- v3-0002-Refactor-pg_waldump-Separate-logic-used-to-calcul.patch
- v3-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch
- v3-0004-pg_waldump-Add-support-for-archived-WAL-decoding.patch
- v3-0005-pg_waldump-Remove-the-restriction-on-the-order-of.patch
- v3-0006-pg_verifybackup-Delay-default-WAL-directory-prepa.patch
- v3-0007-pg_verifybackup-Rename-the-wal-directory-switch-t.patch
- v3-0008-pg_verifybackup-enabled-WAL-parsing-for-tar-forma.patch
On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote: > [..patch] Hi Amul! 0001: LGTM, maybe I would just slightly enhance the commit message ("This is in preparation for adding a second source file to this directory.") -- maye bit a bit more verbose or use a message from 0002? 0002: LGTM 0003: LGTM Tested here (after partial patch apply, and test suite did work fine). 0004: a. Why should it be necessary to provide startLSN (-s) ? Couldn't it autodetect the first WAL (tar file) inside and just use that with some info message? $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar pg_waldump: error: no start WAL location given b. Why would it like to open "blah" dir if I wanted that "blah" segment from the archive? Shouldn't it tell that it was looking in the archive and couldn find it inside? $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah pg_waldump: error: could not open file "blah": Not a directory c. It doesnt work when using SEGSTART, but it's there: $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar 000000010000000000000059 pg_waldump: error: could not open file "000000010000000000000059": Not a directory $ tar tf /tmp/base/pg_wal.tar | head -1 000000010000000000000059 d. I've later noticed that follow-up patches seem to use the -s switch and there it seems to work OK. The above SEGSTART issue was not detected, probably because tests need to be extended cover of segment name rather than just --start LSN (see test_pg_waldump): $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats -s 0/59000358 pg_waldump: first record is after 0/59000358, at 0/590003E8, skipping over 144 bytes WAL statistics between 0/590003E8 and 0/61000000: [..] e. Code around`if (walpath == NULL && directory != NULL)` needs some comments. f. Code around `if (fname != NULL && is_tar_file(fname, &compression))` , so if fname is WAL segment here (00000001000000000000005A) and we do check again if that has been tar-ed (is_tar_file())? Why? g. Just a question: the commit message says `Note that this patch requires that the WAL files within the archive be in sequential order; an error will be reported otherwise`. I'm wondering if such occurrences are known to be happening in the wild? Or is it just an assumption that if someone would modify the tar somehow? (either way we could just add a reason why we need to handle such a case if we know -- is manual alternation the only source of such state?). For the record, I've tested crafting custom archives with out of sequence WAL archives and the code seems to work (it was done using: tar --append -f pg_wal.tar --format=ustar ..) h. Anyway, in case of typo/wrong LSN, 0004 emits wrong error message I think: $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats -s 0/50000358 pg_waldump: error: WAL files are not archived in sequential order pg_waldump: detail: Expecting segment number 80 but found 89. it's just that the 50000358 LSN above is below the minimal LSN present in the WAL segments (first segment is 000000010000000000000059 there, i've just intentionally provided a bad value 50.. as a typo and it causes the wrong message). Now it might not be an issue as with 0005 patch the same test behaves OK (`pg_waldump: error: could not find a valid record after 0/50000358`). It is just relevant if this would be committed not all at once. i. If I give wrong --timeline=999 to pg_waldump it fails with misleading error message: could not read WAL data from "pg_wal.tar" archive: read -1 of 8192 0005: a. I'm wondering if we shouldn't log (to stderr?) some kind of notification message (just once) that non-sequential WAL files were discovered and that pg_waldump is starting to write to $somewhere as it may be causing bigger I/O than anticipated when running the command. This can easily help when troubleshooting why it is not fast, and also having set TMPDIR to usually /tmp can be slow or too small. b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp . We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but that path is completely guessable. If an attacker prepares some symlinks and links those to some other places, I think the code will happily open and overwrite the contents of the rogue symlink. I think using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to be in play. Consider that pg_waldump can be run as root (there's no mechanism preventing it from being used that way). c. IMHO that unlink() might be not guaranteed to always remove files, as in case of any trouble and exit() , those files might be left over. I think we need some atexit() handlers. This can be triggered with combo of options of nonsequential files in tar + wrong LSN given: $ tar tf pg_wal.tar 00000001000000000000005A 00000001000000000000005B 00000001000000000000005C [..] 000000010000000000000060 000000010000000000000059 <-- out of order, appended last $ ls -lh 0* ls: cannot access '0*': No such file or directory $ /usr/pgsql19/bin/pg_waldump --path=/tmp/ble/pg_wal.tar --stats -s 0/10000358 #wrong LSN pg_waldump: error: could not find a valid record after 0/10000358 $ ls -lh 0* -rw------- 1 postgres postgres 16M Sep 8 14:44 000000010000000000000059.waldump.tmp -rw------- 1 postgres postgres 16M Sep 8 14:44 00000001000000000000005A.waldump.tmp [..] 0006: LGTM 0007: a. Commit message says `Future patches to pg_waldump will enable it to decode WAL directly` , but those pg_waldump are earlier patches, right? b. pg_verifybackup should print some info with --progress that it is spawning pg_waldump (pg_verifybackup --progress mode does not display anything related to verifing WALs, but it could) c. I'm wondering, but pg_waldump seems to be not complaining if --end=LSN is made into such a future that it doesn't exist. E.g. If the latest WAL segment is 60 (with end LSN 0/60A77A59), but I run pg_waldump `--end=0/7000000` , it will return code 0 and nothing on stderr. So how sure are we that the necessary WAL segments (as per backup_manifest) are actually inside the tar? It's supposed to be verified, but it isn't for this use case? Same happens if craft special tar and remove just one WAL segment from pg_wal.tar (simulate missing WAL segment), but ask the pg_verifybackup/pg_waldump to verify it to exact last LSN sequence, e.g.: $ /usr/pgsql19/bin/pg_waldump --quiet --path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028 --end=0/60A77A58 && echo OK # but it is not OK OK $ /usr/pgsql19/bin/pg_waldump --stats --path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028 --end=0/60A77A58 WAL statistics between 0/59000028 and 0/5CFFFFD0: # <-- 0/5C LSN maximum detected [..] Notice it has read till 0/5C (but I've asked till 0/60), because I've removed 0D: $ tar tf /tmp/missing/pg_wal.tar| grep ^0 000000010000000000000059 00000001000000000000005A 00000001000000000000005B 00000001000000000000005C 00000001000000000000005E <-- missing 5D Yet it reported no errors. 0008: LGTM Another open question I have is this: shouldn't backup_manifest come with CRC checksum for the archived WALs? Or does that guarantee that backup_manifest WAL-Ranges are present in pg_wal.tar is good enough because individual WAL files are CRC-protected itself? -J.
On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > > On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote: > > > [..patch] > > Hi Amul! > Thanks for your review. I'm replying to a few of your comments now, but for the rest, I need to think about them. I'm kind of in agreement with some of them for the fix, but I won't be able to spend time on that next week due to official travel. I'll try to get back as soon as possible after that. > a. Why should it be necessary to provide startLSN (-s) ? Couldn't > it autodetect the first WAL (tar file) inside and just use that with > some info message? > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar > pg_waldump: error: no start WAL location given > There are two reasons. First, existing pg_waldump --path=some_directory would result in the same error. Second, it would force us to re-read the archive twice just to locate the first WAL segment, which is inefficient. > c. It doesnt work when using SEGSTART, but it's there: > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar > 000000010000000000000059 > pg_waldump: error: could not open file "000000010000000000000059": > Not a directory > $ tar tf /tmp/base/pg_wal.tar | head -1 > 000000010000000000000059 > I don't believe this is the correct use case. The WAL files are inside a tar archive, and the requirement is to use a starting LSN and a timeline (if not the default). > d. I've later noticed that follow-up patches seem to use the > -s switch and there it seems to work OK. The above SEGSTART issue was > not detected, probably because tests need to be extended cover of > segment name rather than just --start LSN (see test_pg_waldump): > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats > -s 0/59000358 > pg_waldump: first record is after 0/59000358, at 0/590003E8, > skipping over 144 bytes > WAL statistics between 0/590003E8 and 0/61000000: > [..] > Hope previous reasoning makes sense to you. > e. Code around`if (walpath == NULL && directory != NULL)` needs > some comments. > I think this is an existing one. > f. Code around `if (fname != NULL && is_tar_file(fname, > &compression))` , so if fname is WAL segment here > (00000001000000000000005A) and we do check again if that has been > tar-ed (is_tar_file())? Why? > Again, how? > g. Just a question: the commit message says `Note that this patch > requires that the WAL files within the archive be in sequential order; > an error will be reported otherwise`. I'm wondering if such > occurrences are known to be happening in the wild? Or is it just an > assumption that if someone would modify the tar somehow? (either way > we could just add a reason why we need to handle such a case if we > know -- is manual alternation the only source of such state?). For the > record, I've tested crafting custom archives with out of sequence WAL > archives and the code seems to work (it was done using: tar --append > -f pg_wal.tar --format=ustar ..) > This is an almost nonexistent occurrence. While pg_basebackup archives WAL files in sequential order, we don't have an explicit code to enforce that order within it. Furthermore, since we can't control how external tools might handle the files, this extra precaution is necessary. > Another open question I have is this: shouldn't backup_manifest come > with CRC checksum for the archived WALs? Or does that guarantee that > backup_manifest WAL-Ranges are present in pg_wal.tar is good enough > because individual WAL files are CRC-protected itself? > I don't know, I have to check pg_verifybackup. Regards, Amul
Here are some review comments on v3-0004: In general, I think this looks pretty nice, but I think it needs more cleanup and polishing. There doesn't seem to be any reason for astreamer_waldump_content_new() to take an astreamer *next argument. If you look at astreamer.h, you'll see that some astreamer_BLAH_new() functions take such an argument, and others don't. The ones that do forward their input to another astreamer; the ones that don't, like astreamer_plain_writer_new(), send it somewhere else. AFAICT, this astreamer is never going to send its output to another astreamer, so there's no reason for this argument. I'm also a little confused by the choice of the name astreamer_waldump_content_new(). I would have thought this would be something like astreamer_waldump_new() or astreamer_xlogreader_new(). The word "content" doesn't seem to me to be adding much here, and it invites confusion with the "content" callback. I think you can merge setup_astreamer() into init_tar_archive_reader(). The only other caller is verify_tar_archive(), but that does exactly the same additional steps as init_tar_archive_reader(), as far as I can see. The return statement for astreamer_wal_read is really odd: + return (count - nbytes) ? (count - nbytes) : -1; Since 0 is false in C, this is equivalent to: count != nbytes ? count - nbytes : -1, but it's a strange way to write it. What makes it even stranger is that it seems as though the intention here is to count the number of bytes read, but you do that by taking the number of bytes requested (count) and subtracting the number of bytes we didn't manage to read (nbytes); and then you just up and return -1 instead of 0 whenever the answer would have been zero. This is all lacking in comments and seems a bit more confusing than it needs to be. So my suggestions are: 1. Consider redefining nbytes to be the number of bytes that you have read instead of the number of bytes you haven't read. So the loop in this function would be while (nbytes < count) instead of while (nbytes > 0). 2. If you need to map 0 to -1, consider having the caller do this instead of putting that inside this function. 3. Add a comment saying what the return value is supposed to be". If you do both 1 and 2, then the return statement can just say "return nbytes;" and the comment can say "Returns the number of bytes successfully read." I would suggest changing the name of the variable from "readBuff" to "readBuf". There are no existing uses of readBuff in the code base. I think this comment also needs improvement: + /* + * Ignore existing data if the required target page has not yet been + * read. + */ + if (recptr >= endPtr) + { + len = 0; + + /* Reset the buffer */ + resetStringInfo(astreamer_buf); + } This comment is problematic for a few reasons. First, we're not ignoring the existing data: we're throwing it out. Second, the comment doesn't say why we're doing what we're doing, only that we're doing it. Here's my guess at the actual explanation -- please correct me if I'm wrong: "pg_waldump never reads the same WAL bytes more than once, so if we're now being asked for data beyond the end of what we've already read, that means none of the data we currently have in the buffer will ever be consulted again. So, we can discard the existing buffer contents and start over." By the way, if this explanation is correct, it might be nice to add an assertion someplace that verifies it, like asserting that we're always reading from an LSN greater than or equal to (or exactly equal to?) the LSN immediately following the last data we read. In general, I wonder whether there's a way to make the separation of concerns between astreamer_wal_read() and TarWALDumpReadPage() cleaner. Right now, the latter is basically a stub, but I'm not sure that is the best thing here. I already mentioned one example of how to do this: make the responsibility for 0 => -1 translation the job of TarWALDumpReadPage() rather than astreamer_wal_read(). But I think there might be a little more we can do. In particular, I wonder whether we could say that astreamer_wal_read() is only responsible for filling the buffer, and the caller, TarWALDumpReadPage() in this case, needs to empty it. That seems like it might produce a cleaner separation of duties. Another thing that isn't so nice right now is that verify_tar_archive() has to open and close the archive only for init_tar_archive_reader() to be called to reopen it again just moments later. It would be nicer to open the file just once and then keep it open. Here again, I wonder if the separation of duties could be a bit cleaner. Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or astreamer_archive_read? The only things that they need are archive_fd, archive_name, archive_streamer, archive_streamer_buf, and archive_streamer_read_ptr. In other words, they really don't care about any of the *existing* things that are in XLogDumpPrivate. This makes me wonder whether we should actually try to make this new astreamer completely independent of xlogreader. In other words, instead of calling it astreamer_waldump() or astreamer_xlogreader() as I proposed above, maybe it could be a completely generic astreamer, say astreamer_stringinfo_new(StringInfo *buf) that just appends to the buffer. That would require also moving the stuff out of astreamer_wal_read() that knows about XLogRecPtr, but why does that function need to know about XLogRecPtr? Couldn't the caller figure out that part and just tell this function how many bytes are needed? -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Sep 12, 2025 at 2:28 PM Robert Haas <robertmhaas@gmail.com> wrote: > Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or > astreamer_archive_read? The only things that they need are archive_fd, > archive_name, archive_streamer, archive_streamer_buf, and > archive_streamer_read_ptr. In other words, they really don't care > about any of the *existing* things that are in XLogDumpPrivate. This > makes me wonder whether we should actually try to make this new > astreamer completely independent of xlogreader. In other words, > instead of calling it astreamer_waldump() or astreamer_xlogreader() as > I proposed above, maybe it could be a completely generic astreamer, > say astreamer_stringinfo_new(StringInfo *buf) that just appends to the > buffer. That would require also moving the stuff out of > astreamer_wal_read() that knows about XLogRecPtr, but why does that > function need to know about XLogRecPtr? Couldn't the caller figure out > that part and just tell this function how many bytes are needed? Hmm, on further thought, I think this was a silly idea. Part of the intended function of this astreamer is to make sure we're only reading WAL files from the archive, and eventually reordering them if required, so obviously something completely generic isn't going to work. Maybe there's a way to make this look a little cleaner and tidier but this isn't it... -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Sep 12, 2025 at 4:25 PM Amul Sul <sulamul@gmail.com> wrote: > > On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak > <jakub.wartak@enterprisedb.com> wrote: > > > > On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote: > > > > > [..patch] > > > > Hi Amul! > > > > Thanks for your review. I'm replying to a few of your comments now, > but for the rest, I need to think about them. I'm kind of in agreement > with some of them for the fix, but I won't be able to spend time on > that next week due to official travel. I'll try to get back as soon as > possible after that. > Reverting on rest of review comments: > 0001: LGTM, maybe I would just slightly enhance the commit message > ("This is in preparation for adding a second source file to this > directory.") -- maye bit a bit more verbose or use a message from > 0002? Done. > b. Why would it like to open "blah" dir if I wanted that "blah" > segment from the archive? Shouldn't it tell that it was looking in the > archive and couldn find it inside? > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah > pg_waldump: error: could not open file "blah": Not a directory Now, an error will be thrown if any additional command-line arguments are provided when an archive is specified, similar to how existing extra arguments are handled. > i. If I give wrong --timeline=999 to pg_waldump it fails with > misleading error message: could not read WAL data from "pg_wal.tar" > archive: read -1 of 8192 Now., added a much better error message for that case. > a. I'm wondering if we shouldn't log (to stderr?) some kind of > notification message (just once) that non-sequential WAL files were > discovered and that pg_waldump is starting to write to $somewhere as > it may be causing bigger I/O than anticipated when running the > command. This can easily help when troubleshooting why it is not fast, > and also having set TMPDIR to usually /tmp can be slow or too small. Now, emitting info messages, but I'm not sure whether we should have info or debug. > b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with > TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp . > We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but > that path is completely guessable. If an attacker prepares some > symlinks and links those to some other places, I think the code will > happily open and overwrite the contents of the rogue symlink. I think > using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to > be in play. Consider that pg_waldump can be run as root (there's no > mechanism preventing it from being used that way). I am not sure what the worst-case scenario would be or what a good alternative is. > c. IMHO that unlink() might be not guaranteed to always remove > files, as in case of any trouble and exit() , those files might be > left over. I think we need some atexit() handlers. This can be > triggered with combo of options of nonsequential files in tar + wrong > LSN given: Done. > 0007: > a. Commit message says `Future patches to pg_waldump will enable > it to decode WAL directly` , but those pg_waldump are earlier patches, > right? Right, fixed. > b. pg_verifybackup should print some info with --progress that it > is spawning pg_waldump (pg_verifybackup --progress mode does not > display anything related to verifing WALs, but it could) If we decide to do that, it could be a separate project, IMHO. > c. I'm wondering, but pg_waldump seems to be not complaining if > --end=LSN is made into such a future that it doesn't exist. The behavior will be kept as if a directory was provided with a start and end LSN. Thanks again for the review. I'll post the new patches in my next reply. Regards, Amul
On Fri, Sep 12, 2025 at 11:58 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Here are some review comments on v3-0004: > Thanks for the review. My replies are below. > There doesn't seem to be any reason for > astreamer_waldump_content_new() to take an astreamer *next argument. > If you look at astreamer.h, you'll see that some astreamer_BLAH_new() > functions take such an argument, and others don't. The ones that do > forward their input to another astreamer; the ones that don't, like > astreamer_plain_writer_new(), send it somewhere else. AFAICT, this > astreamer is never going to send its output to another astreamer, so > there's no reason for this argument. > Done. > I'm also a little confused by the choice of the name > astreamer_waldump_content_new(). I would have thought this would be > something like astreamer_waldump_new() or astreamer_xlogreader_new(). > The word "content" doesn't seem to me to be adding much here, and it > invites confusion with the "content" callback. > Done -- renamed to astreamer_waldump_new(). > I think you can merge setup_astreamer() into > init_tar_archive_reader(). The only other caller is > verify_tar_archive(), but that does exactly the same additional steps > as init_tar_archive_reader(), as far as I can see. > Done. > The return statement for astreamer_wal_read is really odd: > > + return (count - nbytes) ? (count - nbytes) : -1; > Agreed, that's a bit odd. This seems to be leftover code from the experimental patch. The astreamer_wal_read() function should behave like WALRead(): it should either successfully read all the requested bytes or throw an error. Corrected in the attached version. > > I would suggest changing the name of the variable from "readBuff" to > "readBuf". There are no existing uses of readBuff in the code base. > The existing WALDumpReadPage() function has a "readBuff" argument, and I've used it that way for consistency. > I think this comment also needs improvement: > > + /* > + * Ignore existing data if the required target page > has not yet been > + * read. > + */ > + if (recptr >= endPtr) > + { > + len = 0; > + > + /* Reset the buffer */ > + resetStringInfo(astreamer_buf); > + } > > This comment is problematic for a few reasons. First, we're not > ignoring the existing data: we're throwing it out. Second, the comment > doesn't say why we're doing what we're doing, only that we're doing > it. Here's my guess at the actual explanation -- please correct me if > I'm wrong: "pg_waldump never reads the same WAL bytes more than once, > so if we're now being asked for data beyond the end of what we've > already read, that means none of the data we currently have in the > buffer will ever be consulted again. So, we can discard the existing > buffer contents and start over." By the way, if this explanation is > correct, it might be nice to add an assertion someplace that verifies > it, like asserting that we're always reading from an LSN greater than > or equal to (or exactly equal to?) the LSN immediately following the > last data we read. > Updated the comment. The similar assertion exists right before copying to the readBuff. > > Another thing that isn't so nice right now is that > verify_tar_archive() has to open and close the archive only for > init_tar_archive_reader() to be called to reopen it again just moments > later. It would be nicer to open the file just once and then keep it > open. Here again, I wonder if the separation of duties could be a bit > cleaner. > Prefer to keep those separate, assuming that reopening the file won't cause any significant harm. Let me know if you think otherwise. Attached the updated version, kindly have a look. Regards, Amul
Вложения
- v4-0001-Refactor-pg_waldump-Move-some-declarations-to-new.patch
- v4-0002-Refactor-pg_waldump-Separate-logic-used-to-calcul.patch
- v4-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch
- v4-0004-pg_waldump-Add-support-for-archived-WAL-decoding.patch
- v4-0005-pg_waldump-Remove-the-restriction-on-the-order-of.patch
- v4-0006-pg_verifybackup-Delay-default-WAL-directory-prepa.patch
- v4-0007-pg_verifybackup-Rename-the-wal-directory-switch-t.patch
- v4-0008-pg_verifybackup-enabled-WAL-parsing-for-tar-forma.patch
On Thu, Sep 25, 2025 at 4:25 AM Amul Sul <sulamul@gmail.com> wrote: > > Another thing that isn't so nice right now is that > > verify_tar_archive() has to open and close the archive only for > > init_tar_archive_reader() to be called to reopen it again just moments > > later. It would be nicer to open the file just once and then keep it > > open. Here again, I wonder if the separation of duties could be a bit > > cleaner. > > Prefer to keep those separate, assuming that reopening the file won't > cause any significant harm. Let me know if you think otherwise. Well, I guess I'd like to know why we can't do better. I'm not really worried about performance, but reopening the file means that you can never make it work with reading from a pipe. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Sep 29, 2025 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 25, 2025 at 4:25 AM Amul Sul <sulamul@gmail.com> wrote: > > > Another thing that isn't so nice right now is that > > > verify_tar_archive() has to open and close the archive only for > > > init_tar_archive_reader() to be called to reopen it again just moments > > > later. It would be nicer to open the file just once and then keep it > > > open. Here again, I wonder if the separation of duties could be a bit > > > cleaner. > > > > Prefer to keep those separate, assuming that reopening the file won't > > cause any significant harm. Let me know if you think otherwise. > > Well, I guess I'd like to know why we can't do better. I'm not really > worried about performance, but reopening the file means that you can > never make it work with reading from a pipe. I have some skepticism regarding the extra coding that might be introduced, as performance is not my primary concern here. If we aim to keep the file open only once, that logic should be implemented before calling verify_tar_archive(), not inside it. Implementing the open and close logic within verify_tar_archive() and free_tar_archive_reader() would create a confusing and scattered pattern, especially since these separate operations require only two lines of code each (open and close if it's a tar file). My second, concern is that after verify_tar_archive(), we might need to reset the file reader offset to the beginning. While reusing the buffered data from the first iteration is technically possible, that only works if the desired start LSN is at the absolute beginning of the archive, or later in the sequence, which cannot be reliably guaranteed. Therefore, for simplicity and avoid the complexity of managing that offset reset code, I am thinking of a simpler approach. Regards, Amul