On 24/11/2021 21:44, Robert Haas wrote:
> On Wed, Nov 24, 2021 at 12:16 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> And here's another rebase, now that Robert got rid of ReadRecPtr and
>> EndRecPtr.
>
> In general, I think 0001 is a good idea, but the comment that says
> "Set the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag on the page header"
> seems to me to be telling the reader about what's already obvious
> instead of explaining to them the thing they might have missed.
> GetXLogBuffer() says that it's only safe to use if you hold a WAL
> insertion lock and don't go backwards, and here you don't hold a WAL
> insertion lock and I guess you're not going backwards only because
> you're staying in exactly the same place? It seems to me that the only
> reason this is safe is because, at the time this is called, only the
> startup process is able to write WAL, and therefore the race condition
> that would otherwise exist does not.
Yeah, its correctness depends on the fact that no other backend is
allows to write WAL.
> Even then, I wonder what keeps
> the buffer from being flushed after we return from XLogInsert() and
> before we set the bit, and if the answer is that nothing prevents
> that, whether that's OK. It might be good to talk about these issues
> too.
Hmm. We don't advance LogwrtRqst.Write, so I think a concurrent
XLogFlush() would not flush the page. But I agree, that's more
accidental than by design and we should be more explicit about it.
I changed the code so that it sets the XLP_FIRST_IS_OVERWRITE_CONTRECORD
flag in the page header first, and inserts the record only after that.
That way, you don't "go backwards". I also added more sanity checks to
verify that the record really is inserted where we expect.
> Also, you've named the parameter to this new function so that it's
> exactly the same as the global variable. I do approve of trying to
> pass the value as a parameter instead of relying on a global variable,
> and I wonder if you could find a way to remove the global variable
> entirely. But if not, I think the function parameter and the global
> variable should have different names, because otherwise it's easy for
> anyone reading the code to get confused about which one is being
> referenced in any particular spot, and it's also hard to grep.
Renamed the parameter to 'pagePtr', that describes pretty well what it's
used for in the function.
Attached is a new patch set. It includes these changes to
CreateOverwriteContrecordRecord(), and also a bunch of other small changes:
- I moved the code to redo some XLOG record types from xlog_redo() to a
new function in xlogrecovery.c. This got rid of the
HandleBackupEndRecord() callback function I had to add before. This
change is in a separate commit, for easier review. It might make sense
to introduce a new rmgr for those record types, but didn't do that for now.
- I reordered many of the functions in xlogrecord.c, to group together
functions that are used in the initialization, and functions that are
called for each WAL record.
- Improved comments here and there.
- I renamed checkXLogConsistency() to verifyBackupPageConsistency(). I
think it describes the function better. There are a bunch of other
functions with check* prefix like CheckRecoveryConsistency,
CheckTimeLineSwitch, CheckForStandbyTrigger that check for various
conditions, so using "check" to mean "verify" here was a bit confusing.
I think this is ready for commit now. I'm going to wait a day or two to
give everyone a chance to review these latest changes, and then push.
- Heikki