Re: pg_rewind: warn when checkpoint hasn't happened after promotion
От | James Coleman |
---|---|
Тема | Re: pg_rewind: warn when checkpoint hasn't happened after promotion |
Дата | |
Msg-id | CAAaqYe9JxLvqqF=ZGfnqUsw+KBLUu_Rgf37+OtKdR49mhHLZGw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_rewind: warn when checkpoint hasn't happened after promotion (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Список | pgsql-hackers |
On Sat, Jun 4, 2022 at 9:39 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Sat, Jun 4, 2022 at 6:29 PM James Coleman <jtc331@gmail.com> wrote: > > > > A few weeks back I sent a bug report [1] directly to the -bugs mailing > > list, and I haven't seen any activity on it (maybe this is because I > > emailed directly instead of using the form?), but I got some time to > > take a look and concluded that a first-level fix is pretty simple. > > > > A quick background refresher: after promoting a standby rewinding the > > former primary requires that a checkpoint have been completed on the > > new primary after promotion. This is correctly documented. However > > pg_rewind incorrectly reports to the user that a rewind isn't > > necessary because the source and target are on the same timeline. > > > > Specifically, this happens when the control file on the newly promoted > > server looks like: > > > > Latest checkpoint's TimeLineID: 4 > > Latest checkpoint's PrevTimeLineID: 4 > > ... > > Min recovery ending loc's timeline: 5 > > > > Attached is a patch that detects this condition and reports it as an > > error to the user. > > > > In the spirit of the new-ish "ensure shutdown" functionality I could > > imagine extending this to automatically issue a checkpoint when this > > situation is detected. I haven't started to code that up, however, > > wanting to first get buy-in on that. > > > > 1: https://www.postgresql.org/message-id/CAAaqYe8b2DBbooTprY4v=BiZEd9qBqVLq+FD9j617eQFjk1KvQ@mail.gmail.com > > Thanks. I had a quick look over the issue and patch - just a thought - > can't we let pg_rewind issue a checkpoint on the new primary instead > of erroring out, maybe optionally? It might sound too much, but helps > pg_rewind to be self-reliant i.e. avoiding external actor to detect > the error and issue checkpoint the new primary to be able to > successfully run pg_rewind on the pld primary and repair it to use it > as a new standby. That's what I had suggested as a "further improvement" option in the last paragraph :) But I think agreement on this more basic solution would still be good (even if I add the automatic checkpointing in this thread); given we currently explicitly mis-inform the user of pg_rewind, I think this is a bug that should be considered for backpatching, and the simpler "fail if detected" patch is probably the only thing we could backpatch. Thanks for taking a look, James Coleman
В списке pgsql-hackers по дате отправления: