Re: High CPU consumption in cascade replication with large number of walsenders
| От | Alexey Makhmutov |
|---|---|
| Тема | Re: High CPU consumption in cascade replication with large number of walsenders |
| Дата | |
| Msg-id | 26d020da-430c-4b68-96f1-45a86de1b250@postgrespro.ru обсуждение исходный текст |
| Ответ на | Re: High CPU consumption in cascade replication with large number of walsenders (Alexander Korotkov <aekorotkov@gmail.com>) |
| Список | pgsql-hackers |
Hi, Alexander! Thank you again for the attention to this patch! > Could you, please, also comment change from check for AllowCascadeReplication() to StandbyWithCascadeReplication()? Do you think this is beneficial and saves us from sending the notifications when they are useless? The original intention of this check was to avoid enabling all the timer-based machinery for cases, when we definitely don't need to send notifications. So, we try to avoid potential impact of the patch on such cases even if new options are enabled. For example, this may happen if we restore server from backup and apply WAL archive on it (as PerformWalRecovery will be invoked in this case as well). Both 'hot_standby' and 'wal_senders' parameters are enabled by default, so even primary server may pass the the 'AllowCascadeReplication' condition in such case. So, we want to be sure that we the 'StandbyMode' is actually set and we are part of Startup process. However, the change may be also reasonable by itself, to avoid calling WalSndWakeup at all in such cases (thus avoiding acquiring/releasing CV mutex), although I do not think that it will provide measurable improvements. > Also, could you comment this condition. > if (cascadeReplicationMaxBatchSize <= 1 && appliedRecords == 0) > Does this mean that if batching was disabled in config then enforced by SIGHUP, we will still wait for the current batch to be completed? Would it be better to stop batching immediately? Sure, if either of 'cascade_replication_batch_size' or 'cascade_replication_batch_delay' is changed, then we need to flush current batch (send notification) and then decide whether we need to perform batching for next records. This is why we set 'replicationNotificationPending' flag in 'assign_cascade_replication_batch_values' (and disable timer if it is set), so it could be processed in final part 'ProcessStartupProcInterrupts'. So, technically we should not find ourselves in the situation when we have 'cascadeReplicationMaxBatchSize' set to 1 and appliedRecords set to non-zero value. This check for 'appliedRecords' value seems to be a remnant of an my intermediate patch version, where these values were already runtime modifiable, but not yet processed 'assign_cascade_replication_batch_values'. I think it could be safely removed to avoid confusion. Thank you for noticing this! > Also, this patch lacks documentation. I would especially like to see combinations of GUCs described (cascade_replication_batch_size is enabled, but cascade_replication_batch_delay disabled, and vise versa). I've added the documentation for these two parameters in the new version of the patch (config.sgml). The new patch version also contains the change for minimal parameter value for 'cascade_replication_batch_size' - now minimal value is 1 to indicate disabled batching. In previous version both '0' and '1' values were valid options to disable batching, but this was looking ambiguous in the documentation. So, I've decided to leave only '1' as valid value to make it simpler to describe. I've also rebased the patch on top of current master to fix failures during the build checks. Thanks, Alexey
Вложения
В списке pgsql-hackers по дате отправления: