Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
От | Kyotaro Horiguchi |
---|---|
Тема | Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" |
Дата | |
Msg-id | 20211025.095930.625109845638100737.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached" (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Список | pgsql-hackers |
At Wed, 20 Oct 2021 21:35:44 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Hi, > > The FATAL error "recovery ended before configured recovery target was > reached" introduced by commit at [1] in PG 14 is causing the standby > to go down after having spent a good amount of time in recovery. There > can be cases where the arrival of required WAL (for reaching recovery > target) from the archive location to the standby may take time and > meanwhile the standby failing with the FATAL error isn't good. > Instead, how about we make the standby wait for a certain amount of > time (with a GUC) so that it can keep looking for the required WAL. If > it gets the required WAL during the wait time, then it succeeds in > reaching the recovery target (no FATAL error of course). If it > doesn't, the timeout occurs and the standby fails with the FATAL > error. The value of the new GUC can probably be set to the average > time it takes for the WAL to reach archive location from the primary + > from archive location to the standby, default 0 i.e. disabled. > > I'm attaching a WIP patch. I've tested it on my dev system and the > recovery regression tests are passing with it. I will provide a better > version later, probably with a test case. > > Thoughts? It looks like starting a server in non-hot standby mode only fetching from archive. The only difference is it doesn't have timeout. Doesn't that cofiguration meet your requirements? Or, if timeout matters, I agree with Jeff. Retrying in restore_command looks fine. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: