Slow catchup of 2PC (twophase) transactions on replica in LR

Поиск
Список
Период
Сортировка
От Давыдов Виталий
Тема Slow catchup of 2PC (twophase) transactions on replica in LR
Дата
Msg-id 8fab8-65d74c80-1-2f28e880@39088166
обсуждение исходный текст
Ответы Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Ajin Cherian <itsajin@gmail.com>)
Список pgsql-hackers

Dear All,

I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logical replication. There is a master and a replica with established logical replication from the master to the replica with twophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will be increasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problem may create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch.

To reproduce the problem:

  • Setup logical replication from master to replica with subscription parameter twophase =  true.
  • Create some intermediate load on the master (use pgbench with custom sql with prepare+commit)
  • Optionally switch off the replica for some time (keep load on master).
  • Switch on the replica and wait until it reaches the master.

The replica will never reach the master with even some low load on the master. If to remove the load, the replica will reach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn't appear even with a decent load.

I think, the main proplem of 2PC catchup bad performance - the lack of asynchronous commit support for 2PC. For regular transactions asynchronous commit is used on the replica by default (subscrition sycnronous_commit = off). It allows the replication worker process on the replica to avoid fsync (XLogFLush) and to utilize 100% CPU (the background wal writer or checkpointer will do fsync). I agree, 2PC are mostly used in multimaster configurations with two or more nodes which are performed synchronously, but when the node in catchup (node is not online in a multimaster cluster), asynchronous commit have to be used to speedup the catchup.

There is another thing that affects on the disbalance of the master and replica performance. When the master executes requestes from multiple clients, there is a fsync optimization takes place in XLogFlush. It allows to decrease the number of fsync in case when a number of parallel backends write to the WAL simultaneously. The replica applies received transactions in one thread sequentially, such optimization is not applied.

I see some possible solutions:

  • Implement asyncronous commit for 2PC transactions.
  • Do some hacking with enableFsync when it is possible.

I think, asynchronous commit support for 2PC transactions should significantly increase replica performance and help to solve this problem. I tried to implement it (like for usual transactions) but I've found another problem: 2PC state is stored in WAL on prepare, on commit we have to read 2PC state from WAL but the read is delayed until WAL is flushed by the background wal writer (read LSN should be less than flush LSN). Storing 2PC state in a shared memory (as it proposed earlier) may help.


I used the following query to monitor the catchup progress on the master:

SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication;

I used the following script for pgbench to the master:

SELECT md5(random()::text) as mygid \gset
BEGIN;
DELETE FROM test WHERE v = pg_backend_pid();
INSERT INTO test(v) SELECT pg_backend_pid();
PREPARE TRANSACTION $$:mygid$$;
COMMIT PREPARED $$:mygid$$;

 

What do you think?

 

With best regards,

Vitaly Davydov

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: When extended query protocol ends?
Следующее
От: Matthias van de Meent
Дата:
Сообщение: Re: btree: downlink right separator/HIKEY optimization