Обсуждение: pg_waldump: support decoding of WAL inside tarfile

Поиск
Список
Период
Сортировка

pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
Hi All,

Attaching patch to support a new feature that let pg_waldump decode
WAL files directly from a tar archive. This was worked to address a
limitation in pg_verifybackup[1], which couldn't parse WAL files from
tar-formatted backups.

The implementation will align with pg_waldump's existing xlogreader
design, which uses three callback functions to manage WAL segments:
open, read, and close. For tar archives, however, the approach will be
simpler. Instead of using separate callbacks for opening and closing,
the tar archive will be opened once at the start and closed explicitly
at the end.

The core logic will be in the WAL page reading callback. When
xlogreader requests a new WAL page, this callback will be invoked. It
will then call the archive streamer routine to read the WAL data from
the tar archive into a buffer. This data will then be copied into
xlogreader's own buffer, completing the read.

Essentially, this is plumbing work: the new code will be responsible
for getting WAL data from the tar archive and feeding it to the
existing xlogreader. All other WAL page and record decoding logic,
which is already robust within xlogreader, will be reused as is.

This feature is being implemented in a series of patches as:

- Refactoring: The first few patches (0001-0004) are dedicated to
refactoring and minor code changes.

- 005: This patch introduces the core functionality for pg_waldump to
read WAL from a tar archive using the same archive streamer
(fe_utils/astreamer.h) used in pg_verifybackup. This version requires
WAL files in the archive to be in sequential order.

- 006: This patch removes the sequential order restriction. If
pg_waldump encounters an out-of-order WAL file, it writes the file to
a temporary directory. The utility will then continue decoding and
read from this temporary location later.

- 007 and onwards: These patches will update pg_verifybackup to remove the
restriction on WAL parsing for tar-formatted backups. 008 patch renames the
"--wal-directory" switch to "--wal-path" to make it more generic, allowing
it accepts a directory path or a tar archive path.

-----------------------------------
Known Issues & Status:
-----------------------------------
- Timeline Switching: The current implementation in patch 006 does not
correctly handle timeline switching. This is a known issue, especially
when a timeline change occurs on a WAL file that has been written to a
temporary location.

- Testing: Local regression tests on CentOS and macOS M4 are passing.
However, some tests on macOS Sonoma (specifically 008_untar.pl and
010_client_untar.pl) are failing in the GitHub workflow with a "WAL
parsing failed for timeline 1" error. This issue is currently being
investigated.

Please take a look at the attached patch and let me know your
thoughts. This is an initial version, and I am making incremental
improvements to address known issues and limitations.


1] https://git.postgresql.org/pg/commitdiff/8dfd3129027969fdd2d9d294220c867d2efd84aa

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com

Вложения

Re: pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
On Thu, Aug 7, 2025 at 7:47 PM Amul Sul <sulamul@gmail.com> wrote:
> [....]
> -----------------------------------
> Known Issues & Status:
> -----------------------------------
> - Timeline Switching: The current implementation in patch 006 does not
> correctly handle timeline switching. This is a known issue, especially
> when a timeline change occurs on a WAL file that has been written to a
> temporary location.
>

This is still pending and will be addressed in the next version.
Therefore, patch 0006 remains marked as WIP.

> - Testing: Local regression tests on CentOS and macOS M4 are passing.
> However, some tests on macOS Sonoma (specifically 008_untar.pl and
> 010_client_untar.pl) are failing in the GitHub workflow with a "WAL
> parsing failed for timeline 1" error. This issue is currently being
> investigated.
>

This has been fixed in the attached version; all GitHub workflow tests
are now fine.

Regards,
Amul

Вложения

Re: pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
On Mon, Aug 25, 2025 at 5:58 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Thu, Aug 7, 2025 at 7:47 PM Amul Sul <sulamul@gmail.com> wrote:
> > [....]
> > -----------------------------------
> > Known Issues & Status:
> > -----------------------------------
> > - Timeline Switching: The current implementation in patch 006 does not
> > correctly handle timeline switching. This is a known issue, especially
> > when a timeline change occurs on a WAL file that has been written to a
> > temporary location.
> >
>
> This is still pending and will be addressed in the next version.
> Therefore, patch 0006 remains marked as WIP.
>

After testing pg_waldump, I have realised that my previous
understanding of its timeline handling was incorrect. I had mistakenly
assumed by reading xlogreader code that it would use the same
timeline-switching logic found in xlogreader, without first verifying
this behavior. In testing, I found that pg_waldump does not follow
timeline switches. Instead, it expects all WAL files to be from a
single timeline, which is either specified by the user or determined
from the starting segment or default 1.

This is a positive finding, as it means we don't need to make
significant changes to align pg_waldump's current behavior. The
attached patches are now complete and no longer works in progress --
read for review. Additionally, I've dropped patch v2-0004 because it is
no longer necessary. The primary patches that implement the proposed
feature are now 0004 and 0005 in the attached set.

Regards,
Amul

Вложения

Re: pg_waldump: support decoding of WAL inside tarfile

От
Jakub Wartak
Дата:
On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote:
>
[..patch]

Hi Amul!

0001: LGTM, maybe I would just slightly enhance the commit message
("This is in preparation for adding a second source file to this
directory.") -- maye bit a bit more verbose or use a message from
0002?
0002: LGTM
0003: LGTM

Tested here (after partial patch apply, and test suite did work fine).

0004:

    a.  Why should it be necessary to provide startLSN (-s) ? Couldn't
it autodetect the first WAL (tar file) inside and just use that with
some info message?
    $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar
    pg_waldump: error: no start WAL location given

    b. Why would it like to open "blah" dir if I wanted that "blah"
segment from the archive? Shouldn't it tell that it was looking in the
archive and couldn find it inside?
    $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah
    pg_waldump: error: could not open file "blah": Not a directory

    c. It doesnt work when using SEGSTART, but it's there:
    $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar
000000010000000000000059
    pg_waldump: error: could not open file "000000010000000000000059":
Not a directory
    $ tar tf /tmp/base/pg_wal.tar | head -1
    000000010000000000000059

    d.     I've later noticed that follow-up patches seem to use the
-s switch and there it seems to work OK. The above SEGSTART issue was
not detected, probably because tests need to be extended cover  of
segment name rather than just --start LSN (see test_pg_waldump):
    $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats
-s 0/59000358
    pg_waldump: first record is after 0/59000358, at 0/590003E8,
skipping over 144 bytes
    WAL statistics between 0/590003E8 and 0/61000000:
    [..]

    e. Code around`if (walpath == NULL && directory != NULL)` needs
some comments.

    f. Code around `if (fname != NULL && is_tar_file(fname,
&compression))` , so if fname is WAL segment here
(00000001000000000000005A) and we do check again if that has been
tar-ed (is_tar_file())? Why?

    g. Just a question: the commit message says `Note that this patch
requires that the WAL files within the archive be in sequential order;
an error will be reported otherwise`. I'm wondering if such
occurrences are known to be happening in the wild? Or is it just an
assumption that if someone would modify the tar somehow? (either way
we could just add a reason why we need to handle such a case if we
know -- is manual alternation the only source of such state?). For the
record, I've tested crafting custom archives with out of sequence WAL
archives and the code seems to work (it was done using: tar --append
-f pg_wal.tar --format=ustar ..)

    h. Anyway, in case of typo/wrong LSN, 0004 emits wrong error
message I think:

    $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats
-s 0/50000358
    pg_waldump: error: WAL files are not archived in sequential order
    pg_waldump: detail: Expecting segment number 80 but found 89.

    it's just that the 50000358 LSN above is below the minimal LSN
present in the WAL segments (first segment is 000000010000000000000059
there, i've just intentionally provided a bad value 50.. as a typo and
it causes the wrong message). Now it might not be an issue as with
0005 patch the same test behaves OK (`pg_waldump: error: could not
find a valid record after 0/50000358`). It is just relevant if this
would be committed not all at once.

    i. If I give wrong --timeline=999 to pg_waldump it fails with
misleading error message: could not read WAL data from "pg_wal.tar"
archive: read -1 of 8192

0005:
    a. I'm wondering if we shouldn't log (to stderr?) some kind of
notification message (just once) that non-sequential WAL files were
discovered and that pg_waldump is starting to write to $somewhere as
it may be causing bigger I/O than anticipated when running the
command. This can easily help when troubleshooting why it is not fast,
and also having set TMPDIR to usually /tmp can be slow or too small.

    b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with
TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp .
We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but
that path is completely guessable. If an attacker prepares some
symlinks and links those to some other places, I think the code will
happily open and overwrite the contents of the rogue symlink. I think
using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to
be in play. Consider that pg_waldump can be run as root (there's no
mechanism preventing it from being used that way).

    c. IMHO that unlink() might be not guaranteed to always remove
files, as in case of any trouble and exit() , those files might be
left over. I think we need some atexit() handlers. This can be
triggered with combo of options of nonsequential files in tar + wrong
LSN given:

    $ tar tf pg_wal.tar
    00000001000000000000005A
    00000001000000000000005B
    00000001000000000000005C
    [..]
    000000010000000000000060
    000000010000000000000059 <-- out of order, appended last
    $ ls -lh 0*
    ls: cannot access '0*': No such file or directory
    $ /usr/pgsql19/bin/pg_waldump --path=/tmp/ble/pg_wal.tar --stats
-s 0/10000358 #wrong LSN
    pg_waldump: error: could not find a valid record after 0/10000358
    $ ls -lh 0*
    -rw------- 1 postgres postgres 16M Sep  8 14:44
000000010000000000000059.waldump.tmp
    -rw------- 1 postgres postgres 16M Sep  8 14:44
00000001000000000000005A.waldump.tmp
    [..]

0006: LGTM

0007:
    a. Commit message says `Future patches to pg_waldump will enable
it to decode WAL directly` , but those pg_waldump are earlier patches,
right?

    b. pg_verifybackup should print some info with --progress that it
is spawning pg_waldump (pg_verifybackup --progress mode does not
display anything related to verifing WALs, but it could)

    c. I'm wondering, but pg_waldump seems to be not complaining if
--end=LSN is made into such a future that it doesn't exist. E.g. If
the latest WAL segment is 60 (with end LSN 0/60A77A59), but I run
pg_waldump `--end=0/7000000` , it will return code 0 and nothing on
stderr. So how sure are we that the necessary WAL segments (as per
backup_manifest) are actually inside the tar? It's supposed to be
verified, but it isn't for this use case? Same happens if craft
special tar and remove just one WAL segment from pg_wal.tar (simulate
missing WAL segment), but ask the pg_verifybackup/pg_waldump to verify
it to exact last LSN sequence, e.g.:

    $ /usr/pgsql19/bin/pg_waldump --quiet
--path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028
--end=0/60A77A58 && echo OK # but it is not OK
    OK
    $ /usr/pgsql19/bin/pg_waldump --stats
--path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028
--end=0/60A77A58
    WAL statistics between 0/59000028 and 0/5CFFFFD0: # <-- 0/5C LSN
maximum detected
    [..]

    Notice it has read till 0/5C (but I've asked till 0/60), because
I've removed 0D:
    $ tar tf /tmp/missing/pg_wal.tar| grep ^0
    000000010000000000000059
    00000001000000000000005A
    00000001000000000000005B
    00000001000000000000005C
    00000001000000000000005E <-- missing 5D

    Yet it reported no errors.

0008:
    LGTM

Another open question I have is this: shouldn't backup_manifest come
with CRC checksum for the archived WALs? Or does that guarantee that
backup_manifest WAL-Ranges are present in pg_wal.tar is good enough
because individual WAL files are CRC-protected itself?

-J.



Re: pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote:
> >
> [..patch]
>
> Hi Amul!
>

Thanks for your review. I'm replying to a few of your comments now,
but for the rest, I need to think about them. I'm kind of in agreement
with some of them for the fix, but I won't be able to spend time on
that next week due to official travel. I'll try to get back as soon as
possible after that.

>     a.  Why should it be necessary to provide startLSN (-s) ? Couldn't
> it autodetect the first WAL (tar file) inside and just use that with
> some info message?
>     $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar
>     pg_waldump: error: no start WAL location given
>

There are two reasons. First,  existing pg_waldump
--path=some_directory would result in the same error. Second, it would
force us to re-read the archive twice just to locate the first WAL
segment, which is inefficient.

>     c. It doesnt work when using SEGSTART, but it's there:
>     $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar
> 000000010000000000000059
>     pg_waldump: error: could not open file "000000010000000000000059":
> Not a directory
>     $ tar tf /tmp/base/pg_wal.tar | head -1
>     000000010000000000000059
>

I don't believe this is the correct use case. The WAL files are inside
a tar archive, and the requirement is to use a starting LSN and a
timeline (if not the default).

>     d.     I've later noticed that follow-up patches seem to use the
> -s switch and there it seems to work OK. The above SEGSTART issue was
> not detected, probably because tests need to be extended cover  of
> segment name rather than just --start LSN (see test_pg_waldump):
>     $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats
> -s 0/59000358
>     pg_waldump: first record is after 0/59000358, at 0/590003E8,
> skipping over 144 bytes
>     WAL statistics between 0/590003E8 and 0/61000000:
>     [..]
>

Hope previous reasoning makes sense to you.

>     e. Code around`if (walpath == NULL && directory != NULL)` needs
> some comments.
>

I think this is an existing one.

>     f. Code around `if (fname != NULL && is_tar_file(fname,
> &compression))` , so if fname is WAL segment here
> (00000001000000000000005A) and we do check again if that has been
> tar-ed (is_tar_file())? Why?
>

Again, how?

>     g. Just a question: the commit message says `Note that this patch
> requires that the WAL files within the archive be in sequential order;
> an error will be reported otherwise`. I'm wondering if such
> occurrences are known to be happening in the wild? Or is it just an
> assumption that if someone would modify the tar somehow? (either way
> we could just add a reason why we need to handle such a case if we
> know -- is manual alternation the only source of such state?). For the
> record, I've tested crafting custom archives with out of sequence WAL
> archives and the code seems to work (it was done using: tar --append
> -f pg_wal.tar --format=ustar ..)
>

This is an almost nonexistent occurrence. While pg_basebackup archives
WAL files in sequential order, we don't have an explicit code to
enforce that order within it. Furthermore, since we can't control how
external tools might handle the files, this extra precaution is
necessary.


> Another open question I have is this: shouldn't backup_manifest come
> with CRC checksum for the archived WALs? Or does that guarantee that
> backup_manifest WAL-Ranges are present in pg_wal.tar is good enough
> because individual WAL files are CRC-protected itself?
>

I don't know, I have to check pg_verifybackup.

Regards,
Amul



Re: pg_waldump: support decoding of WAL inside tarfile

От
Robert Haas
Дата:
Here are some review comments on v3-0004:

In general, I think this looks pretty nice, but I think it needs more
cleanup and polishing.

There doesn't seem to be any reason for
astreamer_waldump_content_new() to take an astreamer *next argument.
If you look at astreamer.h, you'll see that some astreamer_BLAH_new()
functions take such an argument, and others don't. The ones that do
forward their input to another astreamer; the ones that don't, like
astreamer_plain_writer_new(), send it somewhere else. AFAICT, this
astreamer is never going to send its output to another astreamer, so
there's no reason for this argument.

I'm also a little confused by the choice of the name
astreamer_waldump_content_new(). I would have thought this would be
something like astreamer_waldump_new() or astreamer_xlogreader_new().
The word "content" doesn't seem to me to be adding much here, and it
invites confusion with the "content" callback.

I think you can merge setup_astreamer() into
init_tar_archive_reader(). The only other caller is
verify_tar_archive(), but that does exactly the same additional steps
as init_tar_archive_reader(), as far as I can see.

The return statement for astreamer_wal_read is really odd:

+       return (count - nbytes) ? (count - nbytes) : -1;

Since 0 is false in C, this is equivalent to: count != nbytes ? count
- nbytes : -1, but it's a strange way to write it. What makes it even
stranger is that it seems as though the intention here is to count the
number of bytes read, but you do that by taking the number of bytes
requested (count) and subtracting the number of bytes we didn't manage
to read (nbytes); and then you just up and return -1 instead of 0
whenever the answer would have been zero. This is all lacking in
comments and seems a bit more confusing than it needs to be. So my
suggestions are:

1. Consider redefining nbytes to be the number of bytes that you have
read instead of the number of bytes you haven't read. So the loop in
this function would be while (nbytes < count) instead of while (nbytes
> 0).

2. If you need to map 0 to -1, consider having the caller do this
instead of putting that inside this function.

3. Add a comment saying what the return value is supposed to be".

If you do both 1 and 2, then the return statement can just say "return
nbytes;" and the comment can say "Returns the number of bytes
successfully read."

I would suggest changing the name of the variable from "readBuff" to
"readBuf". There are no existing uses of readBuff in the code base.

I think this comment also needs improvement:

+               /*
+                * Ignore existing data if the required target page
has not yet been
+                * read.
+                */
+               if (recptr >= endPtr)
+               {
+                       len = 0;
+
+                       /* Reset the buffer */
+                       resetStringInfo(astreamer_buf);
+               }

This comment is problematic for a few reasons. First, we're not
ignoring the existing data: we're throwing it out. Second, the comment
doesn't say why we're doing what we're doing, only that we're doing
it. Here's my guess at the actual explanation -- please correct me if
I'm wrong: "pg_waldump never reads the same WAL bytes more than once,
so if we're now being asked for data beyond the end of what we've
already read, that means none of the data we currently have in the
buffer will ever be consulted again. So, we can discard the existing
buffer contents and start over." By the way, if this explanation is
correct, it might be nice to add an assertion someplace that verifies
it, like asserting that we're always reading from an LSN greater than
or equal to (or exactly equal to?) the LSN immediately following the
last data we read.

In general, I wonder whether there's a way to make the separation of
concerns between astreamer_wal_read() and TarWALDumpReadPage()
cleaner. Right now, the latter is basically a stub, but I'm not sure
that is the best thing here. I already mentioned one example of how to
do this: make the responsibility for 0 => -1 translation the job of
TarWALDumpReadPage() rather than astreamer_wal_read(). But I think
there might be a little more we can do. In particular, I wonder
whether we could say that astreamer_wal_read() is only responsible for
filling the buffer, and the caller, TarWALDumpReadPage() in this case,
needs to empty it. That seems like it might produce a cleaner
separation of duties.

Another thing that isn't so nice right now is that
verify_tar_archive() has to open and close the archive only for
init_tar_archive_reader() to be called to reopen it again just moments
later. It would be nicer to open the file just once and then keep it
open. Here again, I wonder if the separation of duties could be a bit
cleaner.

Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or
astreamer_archive_read? The only things that they need are archive_fd,
archive_name, archive_streamer, archive_streamer_buf, and
archive_streamer_read_ptr. In other words, they really don't care
about any of the *existing* things that are in XLogDumpPrivate. This
makes me wonder whether we should actually try to make this new
astreamer completely independent of xlogreader. In other words,
instead of calling it astreamer_waldump() or astreamer_xlogreader() as
I proposed above, maybe it could be a completely generic astreamer,
say astreamer_stringinfo_new(StringInfo *buf) that just appends to the
buffer. That would require also moving the stuff out of
astreamer_wal_read() that knows about XLogRecPtr, but why does that
function need to know about XLogRecPtr? Couldn't the caller figure out
that part and just tell this function how many bytes are needed?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_waldump: support decoding of WAL inside tarfile

От
Robert Haas
Дата:
On Fri, Sep 12, 2025 at 2:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or
> astreamer_archive_read? The only things that they need are archive_fd,
> archive_name, archive_streamer, archive_streamer_buf, and
> archive_streamer_read_ptr. In other words, they really don't care
> about any of the *existing* things that are in XLogDumpPrivate. This
> makes me wonder whether we should actually try to make this new
> astreamer completely independent of xlogreader. In other words,
> instead of calling it astreamer_waldump() or astreamer_xlogreader() as
> I proposed above, maybe it could be a completely generic astreamer,
> say astreamer_stringinfo_new(StringInfo *buf) that just appends to the
> buffer. That would require also moving the stuff out of
> astreamer_wal_read() that knows about XLogRecPtr, but why does that
> function need to know about XLogRecPtr? Couldn't the caller figure out
> that part and just tell this function how many bytes are needed?

Hmm, on further thought, I think this was a silly idea. Part of the
intended function of this astreamer is to make sure we're only reading
WAL files from the archive, and eventually reordering them if
required, so obviously something completely generic isn't going to
work. Maybe there's a way to make this look a little cleaner and
tidier but this isn't it...

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
On Fri, Sep 12, 2025 at 4:25 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak
> <jakub.wartak@enterprisedb.com> wrote:
> >
> > On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote:
> > >
> > [..patch]
> >
> > Hi Amul!
> >
>
> Thanks for your review. I'm replying to a few of your comments now,
> but for the rest, I need to think about them. I'm kind of in agreement
> with some of them for the fix, but I won't be able to spend time on
> that next week due to official travel. I'll try to get back as soon as
> possible after that.
>

Reverting on rest of review comments:

> 0001: LGTM, maybe I would just slightly enhance the commit message
> ("This is in preparation for adding a second source file to this
> directory.") -- maye bit a bit more verbose or use a message from
> 0002?

Done.

>     b. Why would it like to open "blah" dir if I wanted that "blah"
> segment from the archive? Shouldn't it tell that it was looking in the
> archive and couldn find it inside?
>     $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah
>     pg_waldump: error: could not open file "blah": Not a directory

Now, an error will be thrown if any additional command-line
arguments are provided when an archive is specified, similar to how
existing extra arguments are handled.

>     i. If I give wrong --timeline=999 to pg_waldump it fails with
> misleading error message: could not read WAL data from "pg_wal.tar"
> archive: read -1 of 8192

Now., added a much better error message for that case.

>     a. I'm wondering if we shouldn't log (to stderr?) some kind of
> notification message (just once) that non-sequential WAL files were
> discovered and that pg_waldump is starting to write to $somewhere as
> it may be causing bigger I/O than anticipated when running the
> command. This can easily help when troubleshooting why it is not fast,
> and also having set TMPDIR to usually /tmp can be slow or too small.

Now, emitting info messages, but I'm not sure whether we should have
info or debug.

>     b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with
> TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp .
> We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but
> that path is completely guessable. If an attacker prepares some
> symlinks and links those to some other places, I think the code will
> happily open and overwrite the contents of the rogue symlink. I think
> using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to
> be in play. Consider that pg_waldump can be run as root (there's no
> mechanism preventing it from being used that way).

I am not sure what the worst-case scenario would be or what a good
alternative is.

>     c. IMHO that unlink() might be not guaranteed to always remove
> files, as in case of any trouble and exit() , those files might be
> left over. I think we need some atexit() handlers. This can be
> triggered with combo of options of nonsequential files in tar + wrong
> LSN given:

Done.

> 0007:
>     a. Commit message says `Future patches to pg_waldump will enable
> it to decode WAL directly` , but those pg_waldump are earlier patches,
> right?

Right, fixed.

>     b. pg_verifybackup should print some info with --progress that it
> is spawning pg_waldump (pg_verifybackup --progress mode does not
> display anything related to verifing WALs, but it could)

If we decide to do that, it could be a separate project, IMHO.

>     c. I'm wondering, but pg_waldump seems to be not complaining if
> --end=LSN is made into such a future that it doesn't exist.

The behavior will be kept as if a directory was provided with a start
and end LSN.

Thanks again for the review. I'll post the new patches in my next reply.

Regards,
Amul



Re: pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
On Fri, Sep 12, 2025 at 11:58 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Here are some review comments on v3-0004:
>

Thanks for the review. My replies are below.

> There doesn't seem to be any reason for
> astreamer_waldump_content_new() to take an astreamer *next argument.
> If you look at astreamer.h, you'll see that some astreamer_BLAH_new()
> functions take such an argument, and others don't. The ones that do
> forward their input to another astreamer; the ones that don't, like
> astreamer_plain_writer_new(), send it somewhere else. AFAICT, this
> astreamer is never going to send its output to another astreamer, so
> there's no reason for this argument.
>

Done.

> I'm also a little confused by the choice of the name
> astreamer_waldump_content_new(). I would have thought this would be
> something like astreamer_waldump_new() or astreamer_xlogreader_new().
> The word "content" doesn't seem to me to be adding much here, and it
> invites confusion with the "content" callback.
>

Done -- renamed to astreamer_waldump_new().

> I think you can merge setup_astreamer() into
> init_tar_archive_reader(). The only other caller is
> verify_tar_archive(), but that does exactly the same additional steps
> as init_tar_archive_reader(), as far as I can see.
>

Done.

> The return statement for astreamer_wal_read is really odd:
>
> +       return (count - nbytes) ? (count - nbytes) : -1;
>

Agreed, that's a bit odd. This seems to be leftover code from the experimental
patch. The astreamer_wal_read() function should behave like WALRead():
it should either successfully read all the requested bytes or throw an
error. Corrected in the attached version.

>
> I would suggest changing the name of the variable from "readBuff" to
> "readBuf". There are no existing uses of readBuff in the code base.
>

The existing WALDumpReadPage() function has a "readBuff" argument, and
I've used it that way for consistency.

> I think this comment also needs improvement:
>
> +               /*
> +                * Ignore existing data if the required target page
> has not yet been
> +                * read.
> +                */
> +               if (recptr >= endPtr)
> +               {
> +                       len = 0;
> +
> +                       /* Reset the buffer */
> +                       resetStringInfo(astreamer_buf);
> +               }
>
> This comment is problematic for a few reasons. First, we're not
> ignoring the existing data: we're throwing it out. Second, the comment
> doesn't say why we're doing what we're doing, only that we're doing
> it. Here's my guess at the actual explanation -- please correct me if
> I'm wrong: "pg_waldump never reads the same WAL bytes more than once,
> so if we're now being asked for data beyond the end of what we've
> already read, that means none of the data we currently have in the
> buffer will ever be consulted again. So, we can discard the existing
> buffer contents and start over." By the way, if this explanation is
> correct, it might be nice to add an assertion someplace that verifies
> it, like asserting that we're always reading from an LSN greater than
> or equal to (or exactly equal to?) the LSN immediately following the
> last data we read.
>

Updated the comment. The similar assertion exists right before
copying to the readBuff.

>
> Another thing that isn't so nice right now is that
> verify_tar_archive() has to open and close the archive only for
> init_tar_archive_reader() to be called to reopen it again just moments
> later. It would be nicer to open the file just once and then keep it
> open. Here again, I wonder if the separation of duties could be a bit
> cleaner.
>

Prefer to keep those separate, assuming that reopening the file won't
cause any significant harm. Let me know if you think otherwise.

Attached the updated version, kindly have a look.

Regards,
Amul

Вложения

Re: pg_waldump: support decoding of WAL inside tarfile

От
Robert Haas
Дата:
On Thu, Sep 25, 2025 at 4:25 AM Amul Sul <sulamul@gmail.com> wrote:
> > Another thing that isn't so nice right now is that
> > verify_tar_archive() has to open and close the archive only for
> > init_tar_archive_reader() to be called to reopen it again just moments
> > later. It would be nicer to open the file just once and then keep it
> > open. Here again, I wonder if the separation of duties could be a bit
> > cleaner.
>
> Prefer to keep those separate, assuming that reopening the file won't
> cause any significant harm. Let me know if you think otherwise.

Well, I guess I'd like to know why we can't do better. I'm not really
worried about performance, but reopening the file means that you can
never make it work with reading from a pipe.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_waldump: support decoding of WAL inside tarfile

От
Amul Sul
Дата:
On Mon, Sep 29, 2025 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Sep 25, 2025 at 4:25 AM Amul Sul <sulamul@gmail.com> wrote:
> > > Another thing that isn't so nice right now is that
> > > verify_tar_archive() has to open and close the archive only for
> > > init_tar_archive_reader() to be called to reopen it again just moments
> > > later. It would be nicer to open the file just once and then keep it
> > > open. Here again, I wonder if the separation of duties could be a bit
> > > cleaner.
> >
> > Prefer to keep those separate, assuming that reopening the file won't
> > cause any significant harm. Let me know if you think otherwise.
>
> Well, I guess I'd like to know why we can't do better. I'm not really
> worried about performance, but reopening the file means that you can
> never make it work with reading from a pipe.

I have some skepticism regarding the extra coding that might be
introduced, as performance is not my primary concern here. If we aim
to keep the file open only once, that logic should be implemented
before calling verify_tar_archive(), not inside it. Implementing the
open and close logic within verify_tar_archive() and
free_tar_archive_reader() would create a confusing and scattered
pattern, especially since these separate operations require only two
lines of code each (open and close if it's a tar file). My second,
concern is that after verify_tar_archive(), we might need to reset the
file reader offset to the beginning. While reusing the buffered data
from the first iteration is technically possible, that only works if
the desired start LSN is at the absolute beginning of the archive, or
later in the sequence, which cannot be reliably guaranteed. Therefore,
for simplicity and avoid the complexity of managing that offset reset
code, I am thinking of a simpler approach.

Regards,
Amul