Re: [PoC] pg_upgrade: allow to upgrade publisher node
От | Amit Kapila |
---|---|
Тема | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Дата | |
Msg-id | CAA4eK1L6fmTAGS3pY1YHGHhreg424wH6QwYbxqyV_7OF2AXGjw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
RE: [PoC] pg_upgrade: allow to upgrade publisher node
|
Список | pgsql-hackers |
On Mon, Jul 17, 2023 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > I have analyzed more, and concluded that there are no difference between manual > > and shutdown checkpoint. > > > > The difference was whether the CHECKPOINT record has been decoded or not. > > The overall workflow of this test was: > > > > 1. do INSERT > > (2. do CHECKPOINT) > > (3. decode CHECKPOINT record) > > 4. receive feedback message from standby > > 5. do shutdown CHECKPOINT > > > > At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The stucktrace was: > > standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot(). > > > > At step 4, the confirmed_flush of the slot was updated, but ReplicationSlotSave() > > was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 and > > 3 are misssed, the dirty flag is not set and the change is still on the memory. > > > > FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed and > > the patch from Julien is not applied, the updated value will be discarded. This > > is what I observed. The patch forces to save the logical slot at the shutdown > > checkpoint, so the confirmed_lsn is save to disk at step 5. > > > > I see your point but there are comments in walsender.c which indicates > that we also wait for step-5 to get replicated. See [1] and comments > atop walsender.c. If this is true then we don't need a special check > as you have in patch 0003 or at least it doesn't seem to be required > in all cases. > I have studied this a bit more and it seems that is true for physical walsenders where we set the state of walsender as WALSNDSTATE_STOPPING in XLogSendPhysical, then the checkpointer finishes writing checkpoint record and then postmaster sends SIGUSR2 for walsender to exit. IIUC, this whole logic of different stop states has been introduced in commit c6c3334364 based on the discussion in the thread [1]. As per my understanding, logical walsenders don't seem to be waiting for shutdown checkpoint record and finishes before even we LOG that record. It seems that the behavior of logical walsenders is different from physical walsenders where we wait for them to send even the final shutdown checkpoint record before they finish. If so, then we won't be able to switchover to logical subscribers even in case of a clean shutdown. Am, I missing something? [1] - https://www.postgresql.org/message-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zVmYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: