Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
От | Michael Paquier |
---|---|
Тема | Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code |
Дата | |
Msg-id | CAB7nPqQ_Fm8a-OSEeE6n1nT-LUp7ty85Yc+vWW1v06rwQxC1Hg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Patch: add recovery_timeout option to control timeout
of restore_command nonzero status code
Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code |
Список | pgsql-hackers |
On Mon, Feb 9, 2015 at 8:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote: >>> - * Wait for more WAL to arrive. Time out after 5 seconds, >>> - * like when polling the archive, to react to a trigger >>> - * file promptly. >>> + * Wait for more WAL to arrive. Time out after >>> the amount of >>> + * time specified by wal_retrieve_retry_interval, like >>> + * when polling the archive, to react to a >>> trigger file promptly. >>> */ >>> WaitLatch(&XLogCtl->recoveryWakeupLatch, >>> WL_LATCH_SET | WL_TIMEOUT, >>> - 5000L); >>> + wal_retrieve_retry_interval * 1000L); >>> >>> This change can prevent the startup process from reacting to >>> a trigger file. Imagine the case where the large interval is set >>> and the user want to promote the standby by using the trigger file >>> instead of pg_ctl promote. I think that the sleep time should be 5s >>> if the interval is set to more than 5s. Thought? >> >> I disagree here. It is interesting to accelerate the check of WAL >> availability from a source in some cases for replication, but the >> opposite is true as well as mentioned by Alexey at the beginning of >> the thread to reduce the number of requests when requesting WAL >> archives from an external storage type AWS. Hence a correct solution >> would be to check periodically for the trigger file with a maximum >> one-time wait of 5s to ensure backward-compatible behavior. We could >> reduce it to 1s or something like that as well. > > You seem to have misunderstood the code in question. Or I'm missing something. > The timeout of the WaitLatch is just the interval to check for the trigger file > while waiting for more WAL to arrive from streaming replication. Not related to > the retry time to restore WAL from the archive. [Re-reading the code...] Aah.. Yes you are right. Sorry for the noise. Yes let's wait for a maximum of 5s then. I also noticed in previous patch that the wait was maximized to 5s. To begin with, a loop should have been used if it was a sleep, but as now patch uses a latch this limit does not make much sense... Patch updated is attached. Regards, -- Michael
Вложения
В списке pgsql-hackers по дате отправления: