Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Ajin Cherian
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAFPTHDaC6mQECXQUPUoMXkxPo+23Gwx7LeHvtdmuXKSWCMTQgw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers


On Fri, Mar 8, 2024 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
>
> Attach the V108 patch set which addressed above and Peter's comments.
> I also removed the check for "*" in guc check hook.
>


Pushed with minor modifications. I'll keep an eye on BF.

BTW, one thing that we should try to evaluate a bit more is the
traversal of slots in StandbySlotsHaveCaughtup() where we verify if
all the slots mentioned in standby_slot_names have received the
required WAL. Even if the standby_slot_names list is short the total
number of slots can be much larger which can lead to an increase in
CPU usage during traversal. There is an optimization that allows to
cache ss_oldest_flush_lsn and ensures that we don't need to traverse
the slots each time so it may not hit frequently but still there is a
chance. I see it is possible to further optimize this area by caching
the position of each slot mentioned in standby_slot_names in
replication_slots array but not sure whether it is worth.



I tried to test this by configuring a large number of logical slots while making sure the standby slots are at the end of the array and checking if there was any performance hit in logical replication from these searches.

Setup:
1. 1 primary server configured with 3 servers in the standby_slot_names, 1 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots)

2. 1 primary server configured with 3 servers in the standby_slot_names, 100 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots)

3. 1 primary server configured with 3 servers in the standby_slot_names, 500 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots)

In the three setups, 3 standby_slot_names are compared with a list of 2,101 and 501 slots respectively.

I ran a pgbench for 15 minutes for all 3 setups:

Case 1: Average TPS - 8.143399 TPS
Case 2: Average TPS - 8.187462 TPS
Case 3: Average TPS - 8.190611 TPS

I see no degradation in the performance, the differences in performance are well within the run to run variations seen.


Nisha also did some performance tests to record the lag introduced by the large number of slots traversal in StandbySlotsHaveCaughtup(). The tests logged time at the start and end of the XLogSendLogical() call (which eventually calls WalSndWaitForWal() --> StandbySlotsHaveCaughtup())  and calculated total time taken by this function during the load run for different total slots count.
 
Setup:
--one primary with 3 standbys and one subscriber with one active subscription
--hot_standby_feedback=off and sync_replication_slots=false
--made sure the standby slots remain at the end ReplicationSlotCtl->replication_slots array to measure performance of worst case scenario for standby slot search in StandbySlotsHaveCaughtup()
 
pgbench for 15 min was run. Here is the data:
 
Case1 : with 1 logical slot, standby_slot_names having 3 slots
Run1: 626.141642 secs
Run2: 631.930254 secs
 
Case2 : with 100 logical slots,  standby_slot_names having 3 slots
Run1: 629.38332 secs
Run2: 630.548432 secs
 
Case3 : with 500 logical slots,  standby_slot_names having 3 slots
Run1: 629.910829 secs
Run2: 627.924183 secs
 
There was no degradation in performance seen.

Thanks Nisha for helping with the testing.

regards,
Ajin Cherian
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Wienhold
Дата:
Сообщение: Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there
Следующее
От: Erik Wienhold
Дата:
Сообщение: Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there