(CC list trimmed, gmail wouldn't let me send otherwise)
On 22/02/2023 16:29, Maxim Orlov wrote:
> On Tue, 21 Feb 2023 at 16:59, Aleksander Alekseev
> <aleksander@timescale.com <mailto:aleksander@timescale.com>> wrote:
> One thing that still bothers me is that during the upgrade we only
> migrate the CLOG segments (i.e. pg_xact / pg_clog) and completely
> ignore all the rest of SLRUs:
>
> * pg_commit_ts
> * pg_multixact/offsets
> * pg_multixact/members
> * pg_subtrans
> * pg_notify
> * pg_serial
>
> Hi! We do ignore these values, since in order to pg_upgrade the server
> it must be properly stopped and no transactions can outlast this moment.
That sounds right for pg_serial, pg_notify, and pg_subtrans. But not for
pg_commit_ts and the pg_multixacts.
This needs tests for pg_upgrading those SLRUs, after 0, 1 and N wraparounds.
I'm surprised that these patches extend the page numbering to 64 bits,
but never actually uses the high bits. The XID "epoch" is not used, and
pg_xact still wraps around and the segment names are still reused. I
thought we could stop doing that. Certainly if we start supporting
64-bit XIDs properly, that will need to change and we will pg_upgrade
will need to rename the segments again.
The previous versions of these patches did that, but I think you changed
tact in response to Robert's suggestion at [1]:
> Lest we miss the forest for the trees, there is an aspect of this
> patch that I find to be an extremely good idea and think we should try
> to get committed even if the rest of the patch set ends up in the
> rubbish bin. Specifically, there are a couple of patches in here that
> have to do with making SLRUs indexed by 64-bit integers rather than by
> 32-bit integers. We've had repeated bugs in the area of handling SLRU
> wraparound in the past, some of which have caused data loss. Just by
> chance, I ran across a situation just yesterday where an SLRU wrapped
> around on disk for reasons that I don't really understand yet and
> chaos ensued. Switching to an indexing system for SLRUs that does not
> ever wrap around would probably enable us to get rid of a whole bunch
> of crufty code, and would also likely improve the general reliability
> of the system in situations where wraparound is threatened. It seems
> like a really, really good idea.
These new versions of this patch don't achieve the goal of avoiding
wraparound. I think the previous versions that did that was the right
approach.
[1]
https://www.postgresql.org/message-id/CA%2BTgmoZFmTGjgkmjgkcm2-vQq3_TzcoMKmVimvQLx9oJLbye0Q%40mail.gmail.com
- Heikki