Re: Improve WALRead() to suck data directly from WAL buffers when possible

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Improve WALRead() to suck data directly from WAL buffers when possible
Дата
Msg-id CALj2ACXfFn2RZ0V=hbO5OFqpc=bZ5RQL53GbKD=O3SVKZ0=MFQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Andres Freund <andres@anarazel.de>)
Ответы Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Tue, Feb 13, 2024 at 5:06 AM Andres Freund <andres@anarazel.de> wrote:
>
> I doubt there's a sane way to use WALRead() without *first* ensuring that the
> range of data is valid. I think we're better of moving that responsibility
> explicitly to the caller and adding an assertion verifying that.
>
> It doesn't really seem like a necessary, or even particularly useful,
> part. You couldn't just call WALRead() for that, since the caller would need
> to know the range up to which WAL is valid but not yet flushed as well. Thus
> the caller would need to first use WaitXLogInsertionsToFinish() or something
> like it anyway - and then there's no point in doing the WALRead() anymore.
>
> Note that for replicating unflushed data, we *still* might need to fall back
> to reading WAL data from disk. In which case not asserting in WALRead() would
> just make it hard to find bugs, because not using WaitXLogInsertionsToFinish()
> would appear to work as long as data is in wal buffers, but as soon as we'd
> fall back to on-disk (but unflushed) data, we'd send bogus WAL.

Callers of WALRead() do a good amount of work to figure out what's
been flushed out but they read the un-flushed and/or invalid data see
the comment [1] around WALRead() call sites as well as a recent thread
[2] for more details.

IIUC, here's the summary of the discussion that has happened so far:
a) If only replicating flushed data, then ensure all the WALRead()
callers read how much ever is valid out of startptr+count. Fix
provided in [2] can help do that.
b) If only replicating flushed data, then ensure all the
WALReadFromBuffers() callers read how much ever is valid out of
startptr+count. Current and expected WALReadFromBuffers() callers will
anyway determine how much of it is flushed and can validly be read.
c) If planning to replicate unflushed data, then ensure all the
WALRead() callers wait until startptr+count is past the current insert
position with WaitXLogInsertionsToFinish().
d) If planning to replicate unflushed data, then ensure all the
WALReadFromBuffers() callers wait until startptr+count is past the
current insert position with WaitXLogInsertionsToFinish().

Adding an assertion or error in WALReadFromBuffers() for ensuring the
callers do follow the above set of rules is easy. We can just do
Assert(startptr+count <= LogwrtResult.Flush).

However, adding a similar assertion or error in WALRead() gets
trickier as it's being called from many places - walsenders, backends,
external tools etc. even when the server is in recovery. Therefore,
determining the actual valid LSN is a bit of a challenge.

What I think is the best way:
- Try and get the fix provided for (a) at [2].
- Implement both (c) and (d).
- Have the assertion in WALReadFromBuffers() ensuring the callers wait
until startptr+count is past the current insert position with
WaitXLogInsertionsToFinish().
- Have a comment around WALRead() to ensure the callers are requesting
the WAL that's written to the disk because it's hard to determine
what's written to disk as this gets called in many scenarios - when
server is in recovery, for walsummarizer etc.
- In the new test module, demonstrate how one can implement reading
unflushed data with WALReadFromBuffers() and/or WALRead() +
WaitXLogInsertionsToFinish().

Thoughts?

[1]
/*
 * Even though we just determined how much of the page can be validly read
 * as 'count', read the whole page anyway. It's guaranteed to be
 * zero-padded up to the page boundary if it's incomplete.
 */
if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli,
             &errinfo))

[2] https://www.postgresql.org/message-id/CALj2ACWBRFac2TingD3PE3w2EBHXUHY3%3DAEEZPJmqhpEOBGExg%40mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Document efficient self-joins / UPDATE LIMIT techniques.
Следующее
От: Dave Cramer
Дата:
Сообщение: Re: [PATCH] Add native windows on arm64 support