Re: Slow catchup of 2PC (twophase) transactions on replica in LR

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Дата
Msg-id CAA4eK1KOs3s6syZqUgrd2WvjTz64SGf0ToZcRoPMCKKH+M0YFQ@mail.gmail.com
обсуждение исходный текст
Ответ на Slow catchup of 2PC (twophase) transactions on replica in LR  (Давыдов Виталий <v.davydov@postgrespro.ru>)
Ответы Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Давыдов Виталий <v.davydov@postgrespro.ru>)
Список pgsql-hackers
On Thu, Feb 22, 2024 at 6:59 PM Давыдов Виталий
<v.davydov@postgrespro.ru> wrote:
>
> I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during
logicalreplication. There is a master and a replica with established logical replication from the master to the replica
withtwophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will
beincreasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such
problemmay create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch. 
>
> To reproduce the problem:
>
> Setup logical replication from master to replica with subscription parameter twophase =  true.
> Create some intermediate load on the master (use pgbench with custom sql with prepare+commit)
> Optionally switch off the replica for some time (keep load on master).
> Switch on the replica and wait until it reaches the master.
>
> The replica will never reach the master with even some low load on the master. If to remove the load, the replica
willreach the master for much greater time, than expected. I tried the same for regular transactions, but such problem
doesn'tappear even with a decent load. 
>
> I think, the main proplem of 2PC catchup bad performance - the lack of asynchronous commit support for 2PC. For
regulartransactions asynchronous commit is used on the replica by default (subscrition sycnronous_commit = off). It
allowsthe replication worker process on the replica to avoid fsync (XLogFLush) and to utilize 100% CPU (the background
walwriter or checkpointer will do fsync). I agree, 2PC are mostly used in multimaster configurations with two or more
nodeswhich are performed synchronously, but when the node in catchup (node is not online in a multimaster cluster),
asynchronouscommit have to be used to speedup the catchup. 
>

I don't see we do anything specific for 2PC transactions to make them
behave differently than regular transactions with respect to
synchronous_commit setting. What makes you think so? Can you pin point
the code you are referring to?

> There is another thing that affects on the disbalance of the master and replica performance. When the master executes
requestesfrom multiple clients, there is a fsync optimization takes place in XLogFlush. It allows to decrease the
numberof fsync in case when a number of parallel backends write to the WAL simultaneously. The replica applies received
transactionsin one thread sequentially, such optimization is not applied. 
>

Right, I think for this we need to implement parallel apply.

> I see some possible solutions:
>
> Implement asyncronous commit for 2PC transactions.
> Do some hacking with enableFsync when it is possible.
>

Can you be a bit more specific about what exactly you have in mind to
achieve the above solutions?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Why is subscription/t/031_column_list.pl failing so much?