Re: Slow catchup of 2PC (twophase) transactions on replica in LR
От | Amit Kapila |
---|---|
Тема | Re: Slow catchup of 2PC (twophase) transactions on replica in LR |
Дата | |
Msg-id | CAA4eK1+FRrL_fLWLsWQGHZRESg39ixzDX_S9hU8D7aFtU+a8uQ@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Slow catchup of 2PC (twophase) transactions on replica in LR ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
RE: Slow catchup of 2PC (twophase) transactions on replica in LR |
Список | pgsql-hackers |
On Mon, Jul 8, 2024 at 12:34 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > Another possible problem is related to my use case. I haven't reproduced this > > case, just some thoughts. I guess, when two_phase is ON, the PREPARE statement > > may be truncated from the WAL at checkpoint, but COMMIT PREPARED is still kept > > in the WAL. On catchup, I would ask the master to send transactions from some > > restart LSN. I would like to get all such transactions competely, with theirs > > bodies, not only COMMIT PREPARED messages. > > I don't think it is a real issue. WALs for prepared transactions will retain > until they are committed/aborted. > When the two_phase is on and transactions are PREPAREd, they will not be > cleaned up from the memory (See ReorderBufferProcessTXN()). Then, RUNNING_XACT > record leads to update the restart_lsn of the slot but it cannot be move forward > because ReorderBufferGetOldestTXN() returns the prepared transaction (See > SnapBuildProcessRunningXacts()). restart_decoding_lsn of each transaction, which > is a candidate of restart_lsn of the slot. is always behind the startpoint of > its txn. > I see that in 0003/0004, the patch first aborts pending prepared transactions, update's catalog, and then change slot's property via walrcv_alter_slot. What if there is any ERROR (say the remote node is not reachable or there is an error while updating the catalog) after we abort the pending prepared transaction? Won't we end up with lost prepared transactions in such a case? Few other comments: ================= The code to handle SUBOPT_TWOPHASE_COMMIT should be after failover option handling for the sake of code symmetry. Also, the checks should be in same order like first for slot_name, then enabled, then for PreventInTransactionBlock(), after those, we can have other checks for two_phase. If possible, we can move common checks in both failover and two_phase options into a common function. What should be the behavior if one tries to set slot_name to NONE and also tries to toggle two_pahse option? I feel both options together don't makes sense because there is no use in changing two_phase for some slot which we are disassociating the subscription from. The same could be said for the failover option as well, so if we agree with some different behavior here, we can follow the same for failover option as well. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: