Re: Various bugs if segment containing redo pointer does not exist

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Various bugs if segment containing redo pointer does not exist
Дата
Msg-id 20231024004352.penbryfk3fvisocw@awork3.anarazel.de
обсуждение исходный текст
Ответ на Various bugs if segment containing redo pointer does not exist  (Andres Freund <andres@anarazel.de>)
Ответы Re: Various bugs if segment containing redo pointer does not exist  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-10-23 16:21:45 -0700, Andres Freund wrote:
> 1) For some reason I haven't yet debugged, the ReadRecord(PANIC) in
>    PerformWalRecovery() doesn't PANIC, but instead just returns NULL
>
>    We *do* output a DEBUG message, but well, that's insufficient.

The debug is from this backtrace:

#0  XLogFileReadAnyTLI (segno=6476, emode=13, source=XLOG_FROM_PG_WAL) at
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:4291
#1  0x000055d7b3949db0 in WaitForWALToBecomeAvailable (RecPtr=108649259008, randAccess=true, fetching_ckpt=false,
tliRecPtr=108665421680,replayTLI=1,
 
    replayLSN=108665421680, nonblocking=false) at
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3697
#2  0x000055d7b39494ff in XLogPageRead (xlogreader=0x55d7b472c470, targetPagePtr=108649250816, reqLen=8192,
targetRecPtr=108665421680,
    readBuf=0x55d7b47ba5d8 "\024\321\005") at
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3278
#3  0x000055d7b3941bb1 in ReadPageInternal (state=0x55d7b472c470, pageptr=108665413632, reqLen=8072)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:1014
#4  0x000055d7b3940f43 in XLogDecodeNextRecord (state=0x55d7b472c470, nonblocking=false)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:571
#5  0x000055d7b3941a41 in XLogReadAhead (state=0x55d7b472c470, nonblocking=false) at
/home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:947
#6  0x000055d7b393f5fa in XLogPrefetcherNextBlock (pgsr_private=94384934340072, lsn=0x55d7b47cfeb8)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:496
#7  0x000055d7b393efcd in lrq_prefetch (lrq=0x55d7b47cfe88) at
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:256
#8  0x000055d7b393f190 in lrq_complete_lsn (lrq=0x55d7b47cfe88, lsn=0) at
/home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:294
#9  0x000055d7b39401ba in XLogPrefetcherReadRecord (prefetcher=0x55d7b47bc5e8, errmsg=0x7ffc23505920)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:1041
#10 0x000055d7b3948ff8 in ReadRecord (xlogprefetcher=0x55d7b47bc5e8, emode=23, fetching_ckpt=false, replayTLI=1)
    at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3078
#11 0x000055d7b3946749 in PerformWalRecovery () at
/home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:1640


The source of the emode=13=DEBUG2 is that that's hardcoded in
WaitForWALToBecomeAvailable(). I guess the error ought to come from
XLogPageRead(), but all that happens is this:

            case XLREAD_FAIL:
                if (readFile >= 0)
                    close(readFile);
                readFile = -1;
                readLen = 0;
                readSource = XLOG_FROM_ANY;
                return XLREAD_FAIL;

which *does* error out for some other failures:
            errno = save_errno;
            ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
                    (errcode_for_file_access(),
                     errmsg("could not read from WAL segment %s, LSN %X/%X, offset %u: %m",
                            fname, LSN_FORMAT_ARGS(targetPagePtr),
                            readOff)));

but not for a file that couldn't be opened. Which wouldn't have to be due to
ENOENT, could also be EACCESS...


xlogreader has undergone a fair bit of changes in the last few releases. As of
now, I can't really understand who is responsible for reporting what kind of
error.

    /*
     * Data input callback
     *
     * This callback shall read at least reqLen valid bytes of the xlog page
     * starting at targetPagePtr, and store them in readBuf.  The callback
     * shall return the number of bytes read (never more than XLOG_BLCKSZ), or
     * -1 on failure.  The callback shall sleep, if necessary, to wait for the
     * requested bytes to become available.  The callback will not be invoked
     * again for the same page unless more than the returned number of bytes
     * are needed.
     *
     * targetRecPtr is the position of the WAL record we're reading.  Usually
     * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
     * to read and verify the page or segment header, before it reads the
     * actual WAL record it's interested in.  In that case, targetRecPtr can
     * be used to determine which timeline to read the page from.
     *
     * The callback shall set ->seg.ws_tli to the TLI of the file the page was
     * read from.
     */
    XLogPageReadCB page_read;

    /*
     * Callback to open the specified WAL segment for reading.  ->seg.ws_file
     * shall be set to the file descriptor of the opened segment.  In case of
     * failure, an error shall be raised by the callback and it shall not
     * return.
     *
     * "nextSegNo" is the number of the segment to be opened.
     *
     * "tli_p" is an input/output argument. WALRead() uses it to pass the
     * timeline in which the new segment should be found, but the callback can
     * use it to return the TLI that it actually opened.
     */
    WALSegmentOpenCB segment_open;

My reading of this is that page_read isn't ever supposed to error out - yet we
do it all over (e.g. WALDumpReadPage(), XLogPageRead()) and that segment_open
is supposed to error out - but we don't even use segment_open for
xlogrecovery.c.  Who is supposed to error out then?  And why is it ok for the
read callbacks to error out directly, if they're supposed to return -1 in case
of failure?

Somewhat of an aside: It also seems "The callback shall sleep, if necessary,
to wait for the requested bytes to become available." is outdated, given that
we *explicitly* don't do so in some cases and support that via
XLREAD_WOULDBLOCK?


I dug through recent changes, expecting to find the problem. But uh, no. I
reproduced this in 9.4, and I think the behaviour might have been introduced
in 9.3 (didn't have a build of that around, hence didn't test that), as part
of:

commit abf5c5c9a4f
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   2013-02-22 11:43:04 +0200

    If recovery.conf is created after "pg_ctl stop -m i", do crash recovery.

Before that, XLogPageRead() called XLogFileReadAnyTLI() with the "incoming"
emode, when in crash recovery -> PANIC in this case. After that commit, the
call to XLogFileReadAnyTLI() was moved to WaitForWALToBecomeAvailable(), but
the emode was changed to unconditionally be DEBUG2.

                /*
                 * Try to restore the file from archive, or read an existing
                 * file from pg_xlog.
                 */
                readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
                        currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
                                         currentSource);

In 9.4 the server actually came up ok after encountering the problem
(destroying data in the process!), not sure where we started to bogusly
initialize shared memory...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Wienhold
Дата:
Сообщение: Re: Patch: Improve Boolean Predicate JSON Path Docs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Show version of OpenSSL in ./configure output