Обсуждение: Propagate XLogFindNextRecord error to callers
Hi,
Currently, XLogFindNextRecord errormsg is ignored and callers will only output a generic 'could not find a valid record' message without details.
Additionally, invalid page header won't go through XLogReadRecord, leaving the error in state->errormsg_buf.
This patch propagates XLogFindNextRecord's error message to the callers and displays it. In case of an invalid page header, the errormsg is filled with errormsg_buf content.
This patch propagates XLogFindNextRecord's error message to the callers and displays it. In case of an invalid page header, the errormsg is filled with errormsg_buf content.
With this patch, pg_waldump will now have the following output when reading a file with an invalid header:
pg_waldump: error: could not find a valid record after D80/5C000000: invalid magic number D116 in WAL segment 0000001400000D8000000017, LSN D80/5C000000, offset 0
Regards,
Anthonin Bonnefoy
Вложения
Hi Anthonin, On 12/12/2025 10:39, Anthonin Bonnefoy wrote: > > With this patch, pg_waldump will now have the following output when > reading a file with an invalid header: > pg_waldump: error: could not find a valid record after D80/5C000000: > invalid magic number D116 in WAL segment 0000001400000D8000000017, LSN > D80/5C000000, offset 0 I've picked up the review for your patch. Attached is a failing test that reproduces the issue. Have I got it right? We can consider using it to validate your patch then. -- Thanks, Mircea Cadariu
Вложения
On Tue, Feb 3, 2026 at 5:05 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > I've picked up the review for your patch. Thanks for picking it! > Attached is a failing test that reproduces the issue. Have I got it > right? We can consider using it to validate your patch then. Yeah, that's the gist of it. However, the test you've written will only work on little endian architectures. Also, I think the xlog page header size won't have the 4 bytes padding on 32 bits systems. I've added a similar test in 001_basic.pl, but it relies on copying an existing WAL file and setting the WAL magic to 0000. This way, the result will be the same independent of the endianness and memory padding. Regards, Anthonin Bonnefoy
Вложения
On Mon, 09 Feb 2026 at 10:21, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote:
> On Tue, Feb 3, 2026 at 5:05 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
>> I've picked up the review for your patch.
>
> Thanks for picking it!
>
>> Attached is a failing test that reproduces the issue. Have I got it
>> right? We can consider using it to validate your patch then.
>
> Yeah, that's the gist of it. However, the test you've written will
> only work on little endian architectures. Also, I think the xlog page
> header size won't have the 4 bytes padding on 32 bits systems.
>
> I've added a similar test in 001_basic.pl, but it relies on copying an
> existing WAL file and setting the WAL magic to 0000. This way, the
> result will be the same independent of the endianness and memory
> padding.
>
+ {
+ if (errormsg)
+ pg_fatal("could not find a valid record after %X/%X: %s",
+ LSN_FORMAT_ARGS(private.startptr), errormsg);
+ else
+ pg_fatal("could not find a valid record after %X/%X",
+ LSN_FORMAT_ARGS(private.startptr));
For consistency, this should use the %X/%08X format as elsewhere.
> Regards,
> Anthonin Bonnefoy
>
> [2. text/x-diff; v2-0001-Propage-errormsg-to-XLogFindNextRecord-caller.patch]...
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
On Mon, Feb 9, 2026 at 10:36 AM Japin Li <japinli@hotmail.com> wrote: > For consistency, this should use the %X/%08X format as elsewhere. Good catch, I've updated the patch.
Вложения
Hi Anthonin,
Thanks for the updated patch.
I have noticed this code added in XLogFindNextRecord in the patch, appears also in XLogNextRecord (line 334).
+ if (state->errormsg_deferred)
+ {
+ if (state->errormsg_buf[0] != '\0')
+ *errormsg = state->errormsg_buf;
+ state->errormsg_deferred = false;
+ }
+In XLogNextRecord, right before the above code, we do *errormsg = NULL. Should this be done also in XLogFindNextRecord in the patch?
If so, what about even extracting a helper method which will be called from both places?
A nit for the commit message: Propage -> Propagate
-- Thanks, Mircea Cadariu
On Mon, Feb 9, 2026 at 1:15 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > In XLogNextRecord, right before the above code, we do *errormsg = NULL. Should this be done also in XLogFindNextRecordin the patch? > > If so, what about even extracting a helper method which will be called from both places? XLogReadRecord may already have consumed errormsg_deferred and set errormsg. We can't set it to NULL, or that would erase a valid error message. A simplification would have been to remove processing the deferred error processing from XLogNextRecord and leave it to XLogFindNextRecord, but there are some calls to XLogNextRecord (like in xlogprefetcher.c), so being able to get the error message when calling XLogNextRecord is necessary. > A nit for the commit message: Propage -> Propagate Fixed
Вложения
On 11/02/2026 10:43, Anthonin Bonnefoy wrote: > XLogReadRecord may already have consumed errormsg_deferred and set > errormsg. We can't set it to NULL, or that would erase a valid error Indeed. Should we place this initialisation at the top of XLogFindNextRecord, before any processing? At that point, there's nothing to erase. > + /* > + * we may have reported errors due to invalid WAL header, propagate the > + * error message to the caller. > + */ You can consider capitalising. -- Thanks, Mircea Cadariu
On Wed, Feb 11, 2026 at 1:16 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > Indeed. Should we place this initialisation at the top of > XLogFindNextRecord, before any processing? At that point, there's > nothing to erase. That makes sense, I've added the '*errormsg = NULL' at the top of the function. > You can consider capitalising. Done
Вложения
Hi,
On 12/02/2026 09:43, Anthonin Bonnefoy wrote:
Done
Thanks for the updated patch.
It applies cleanly on HEAD and the tests pass locally, as well as in CI.
I've set the patch entry to 'Ready for Committer'.
-- Thanks, Mircea Cadariu
> On Feb 12, 2026, at 15:43, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote: > > On Wed, Feb 11, 2026 at 1:16 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: >> Indeed. Should we place this initialisation at the top of >> XLogFindNextRecord, before any processing? At that point, there's >> nothing to erase. > > That makes sense, I've added the '*errormsg = NULL' at the top of the function. > >> You can consider capitalising. > > Done > <v5-0001-Propagate-errormsg-to-XLogFindNextRecord-caller.patch> Hi Anthonin, Thanks for the patch. I agree it’s useful to print a more detailed error message instead of the generic one. From a design perspective, I’m not sure we need to add a new errormsg parameter to XLogFindNextRecord(). The new parameterultimately just exposes state->errormsg_buf, so the returned errormsg implicitly depends on the lifetime of state,and we also need extra handling for cases like errormsg == NULL. Instead, perhaps we could add a helper function, say XLogReaderGetLastError(XLogReaderState *state). which internally pstrdup()sstate->errormsg_buf (after checking errormsg_deferred, etc.). That way the caller owns the returned string explicitly,and there’s no hidden dependency on the reader state’s lifetime. This would also avoid changing the XLogFindNextRecord() signature while making the ownership semantics clearer. What do you think? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Thanks for the comments! On Tue, Feb 24, 2026 at 5:00 AM Chao Li <li.evan.chao@gmail.com> wrote: > From a design perspective, I’m not sure we need to add a new errormsg parameter to XLogFindNextRecord(). The new parameterultimately just exposes state->errormsg_buf, so the returned errormsg implicitly depends on the lifetime of state,and we also need extra handling for cases like errormsg == NULL. > > Instead, perhaps we could add a helper function, say XLogReaderGetLastError(XLogReaderState *state). which internally pstrdup()sstate->errormsg_buf (after checking errormsg_deferred, etc.). That way the caller owns the returned string explicitly,and there’s no hidden dependency on the reader state’s lifetime. > > This would also avoid changing the XLogFindNextRecord() signature while making the ownership semantics clearer. One issue I see is that it introduces another way to get the error, with XLogReadRecord and XLogNextRecord using an errormsg parameter, and XLogFindNextRecord using the helper function. Maybe the solution would be to change both XLogReadRecord and XLogNextRecord to use this new function to stay consistent, but that means changing their signatures. Also, I see the errormsg parameter as a way to signal the caller that "this function can fail, the detailed error will be available here". With the XLogReaderGetLastError, it becomes the caller's responsibility to know which function may fill the error message and check it accordingly. The error message is likely printed shortly after the function's call, so I suspect the risk of using the errormsg after its intended lifetime is low. You bring up a good point about the errormsg's lifetime, which is definitely something to mention in the function's comments. I've updated the patch with the additional comments. Regards, Anthonin Bonnefoy