Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Melih Mutlu
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAGPVpCQdZ_oj-QFcTOhTrUTs-NCKrrZ=ZNCNPR1qe27rXV-iYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi,

Attached new versions of the patch with some changes/fixes.

Here also some numbers to compare the performance of log. rep. with this patch against the current master branch.
 
My method of benchmarking is the same with what I did earlier in this thread. (on a different environment, so not compare the result from this email with the ones from earlier emails)
With those changes, I did some benchmarking to see if it improves anything.
This results compares this patch with the latest version of master branch. "max_sync_workers_per_subscription" is set to 2 as default. 
Got some results simply averaging timings from 5 consecutive runs for each branch.

Since this patch is expected to improve log. rep. of empty/close-to-empty tables, started with measuring performance with empty tables.

            |  10 tables      |  100 tables        |  1000 tables
------------------------------------------------------------------------------
master |  283.430 ms  |  22739.107 ms  |  105226.177 ms
------------------------------------------------------------------------------
 patch  |  189.139 ms  |  1554.802 ms    |  23091.434 ms

After the changes discussed here [1], concurrent replication origin drops by apply worker and tablesync workers may hold each other on wait due to locks taken by replorigin_drop_by_name.
I see that this harms the performance of logical replication quite a bit in terms of speed.
[1] https://www.postgresql.org/message-id/flat/20220714115155.GA5439%40depesz.com
 
Firstly, as I mentioned, replication origin drops made things worse for the master branch. 
Locks start being a more serious issue when the number of tables increases.
The patch reuses the origin so does not need to drop them in each iteration. That's why the difference between the master and the patch is more significant now than it was when I first sent the patch.

To just show that the improvement is not only the result of reuse of origins, but also reuse of rep. slots and workers, I just reverted those commits which causes the origin drop issue.

              |  10 tables      |  100 tables        |  1000 tables
-----------------------------------------------------------------------------
reverted |  270.012 ms  |  2483.907 ms   |  31660.758 ms
-----------------------------------------------------------------------------
 patch    |  189.139 ms  |  1554.802 ms   |  23091.434 ms

With this patch, logical replication is still faster, even if we wouldn't have an issue with rep. origin drops. 

Also here are some numbers with 10 tables loaded with some data :

             |     10 MB          |     100 MB           
----------------------------------------------------------
master  |  2868.524 ms   |  14281.711 ms   
----------------------------------------------------------
 patch   |  1750.226 ms   |  14592.800 ms 

The difference between the master and the patch is getting close when the size of tables increase, as expected.


I would appreciate any feedback/thought on the approach/patch/numbers etc.

Thanks,
--
Melih Mutlu
Microsoft
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: plpgsq_plugin's stmt_end() is not called when an error is caught
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Perform streaming logical transactions by background workers and parallel apply