Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Дата	10 февраля 2015 г. 13:33:07
Msg-id	CAB7nPqQ_Fm8a-OSEeE6n1nT-LUp7ty85Yc+vWW1v06rwQxC1Hg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code (Fujii Masao <masao.fujii@gmail.com>)
Ответы	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Список	pgsql-hackers

Дерево обсуждения

On Mon, Feb 9, 2015 at 8:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote:
>>> -                     * Wait for more WAL to arrive. Time out after 5 seconds,
>>> -                     * like when polling the archive, to react to a trigger
>>> -                     * file promptly.
>>> +                     * Wait for more WAL to arrive. Time out after
>>> the amount of
>>> +                     * time specified by wal_retrieve_retry_interval, like
>>> +                     * when polling the archive, to react to a
>>> trigger file promptly.
>>>                       */
>>>                      WaitLatch(&XLogCtl->recoveryWakeupLatch,
>>>                                WL_LATCH_SET | WL_TIMEOUT,
>>> -                              5000L);
>>> +                              wal_retrieve_retry_interval * 1000L);
>>>
>>> This change can prevent the startup process from reacting to
>>> a trigger file. Imagine the case where the large interval is set
>>> and the user want to promote the standby by using the trigger file
>>> instead of pg_ctl promote. I think that the sleep time should be 5s
>>> if the interval is set to more than 5s. Thought?
>>
>> I disagree here. It is interesting to accelerate the check of WAL
>> availability from a source in some cases for replication, but the
>> opposite is true as well as mentioned by Alexey at the beginning of
>> the thread to reduce the number of requests when requesting WAL
>> archives from an external storage type AWS. Hence a correct solution
>> would be to check periodically for the trigger file with a maximum
>> one-time wait of 5s to ensure backward-compatible behavior. We could
>> reduce it to 1s or something like that as well.
>
> You seem to have misunderstood the code in question. Or I'm missing something.
> The timeout of the WaitLatch is just the interval to check for the trigger file
> while waiting for more WAL to arrive from streaming replication. Not related to
> the retry time to restore WAL from the archive.

[Re-reading the code...]
Aah.. Yes you are right. Sorry for the noise. Yes let's wait for a
maximum of 5s then. I also noticed in previous patch that the wait was
maximized to 5s. To begin with, a loop should have been used if it was
a sleep, but as now patch uses a latch this limit does not make much
sense... Patch updated is attached.
Regards,
--
Michael

Вложения

20150210_wal_source_check_v7.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Вложения