Re: [PoC] pg_upgrade: allow to upgrade publisher node
От | vignesh C |
---|---|
Тема | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Дата | |
Msg-id | CALDaNm1TLfZZ5aUnhP477KTdK7xgzwZJ2m0b=73nJNzQNM8SdA@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: [PoC] pg_upgrade: allow to upgrade publisher node ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
RE: [PoC] pg_upgrade: allow to upgrade publisher node |
Список | pgsql-hackers |
On Fri, 21 Jul 2023 at 13:00, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > > Based on the above, we are considering that we delay the timing of shutdown for > > logical walsenders. The preliminary workflow is: > > > > 1. When logical walsenders receives siginal from checkpointer, it consumes all > > of WAL records, change its state into WALSNDSTATE_STOPPING, and stop > > doing > > anything. > > 2. Then the checkpointer does the shutdown checkpoint > > 3. After that postmaster sends signal to walsenders, same as current > > implementation. > > 4. Finally logical walsenders process the shutdown checkpoint record and update > > the > > confirmed_lsn after the acknowledgement from subscriber. > > Note that logical walsenders don't have to send a shutdown checkpoint record > > to subscriber but following keep_alive will help us to increment the > > confirmed_lsn. > > 5. All tasks are done, they exit. > > > > This mechanism ensures that the confirmed_lsn of active slots is same as the > > current > > WAL location of old publisher, so that 0003 patch would become more simpler. > > We would not have to calculate the acceptable difference anymore. > > > > One thing we must consider is that any WALs must not be generated while > > decoding > > the shutdown checkpoint record. It causes the PANIC. IIUC the record leads > > SnapBuildSerializationPoint(), which just serializes snapbuild or restores from > > it, so the change may be acceptable. Thought? > > I've implemented the ideas from my previous proposal, PSA another patch set. > Patch 0001 introduces the state WALSNDSTATE_STOPPING to logical walsenders. The > workflow remains largely the same as described in my previous post, with the > following additions: > > * A flag has been added to track whether all the WALs have been flushed. The > logical walsender can only exit after the flag is set. This ensures that all > WALs are flushed before the termination of the walsender. > * Cumulative statistics are now forcibly written before changing the state. > While the previous involved reporting stats upon process exit, the current approach > must report earlier due to the checkpointer's termination timing. See comments > in CheckpointerMain() and atop pgstat_before_server_shutdown(). > * At the end of processes, slots are now saved to disk. > > > Patch 0002 adds --include-logical-replication-slots option to pg_upgrade, > not changed from previous set. > > Patch 0003 adds a check function, which becomes simpler. > The previous version calculated the "acceptable" difference between confirmed_lsn > and the current WAL position. This was necessary because shutdown records could > not be sent to subscribers, creating a disparity in these values. However, this > approach had drawbacks, such as needing adjustments if record sizes changed. > > Now, the record can be sent to subscribers, so the hacking is not needed anymore, > at least in the context of logical replication. The consistency is now maintained > by the logical walsenders, so slots created by the backend could not be. > We must consider what should be... > > How do you think? Here is a patch which checks that there are no WAL records other than CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion from [1]. Patch 0001 and 0002 is same as the patch posted by Kuroda-san, Patch 0003 exposes pg_get_wal_records_content to get the WAL records along with the WAL record type between start and end lsn. pg_walinspect contrib module already exposes a function for this requirement, I have moved this functionality to be exposed from the backend. Patch 0004 has slight change in check function to check that there are no other records other than CHECKPOINT_SHUTDOWN to be consumed. The attached patch has the changes for the same. Thoughts? [1] - https://www.postgresql.org/message-id/CAA4eK1Kem-J5NM7GJCgyKP84pEN6RsG6JWo%3D6pSn1E%2BiexL1Fw%40mail.gmail.com Regards, Vignesh
Вложения
В списке pgsql-hackers по дате отправления: