Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
От | Petr Jelinek |
---|---|
Тема | Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher |
Дата | |
Msg-id | 9391d009-3fec-4255-4bbf-ff54de511c5a@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
|
Список | pgsql-hackers |
On 21/04/17 06:11, Michael Paquier wrote: > On Fri, Apr 21, 2017 at 12:29 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 4/20/17 07:52, Petr Jelinek wrote: >>> On 20/04/17 05:57, Michael Paquier wrote: >>>> 2nd thoughts here... Ah now I see your point. True that there is no >>>> way to ensure that an unwanted command is not running when SIGUSR2 is >>>> received as the shutdown checkpoint may have already begun. Here is an >>>> idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and >>>> the shutdown checkpoint does not run as long as all WAL senders still >>>> running do not reach such a state. >>> >>> +1 to this solution >> >> Michael, can you attempt to supply a patch? > > Hmm. I have been actually looking at this solution and I am having > doubts regarding its robustness. In short this would need to be > roughly a two-step process: > - In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to > make it call ShutdownXLOG(). Prior doing that, a first signal should > be sent to all the WAL senders with > SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be > used. > - At reception of this signal, all WAL senders switch to a stopping > state, refusing commands that can generate WAL. > - Checkpointer looks at the state of all WAL senders, looping with a > sleep call of a couple of ms, refusing to launch the shutdown > checkpoint as long as all WAL senders have not switched to the > stopping state. > - In reaper(), once checkpointer is confirmed as stopped, signal again > the WAL senders, and tell them to perform the last loop. > > After that, I got a second, more simple idea. > CheckpointerShmem->ckpt_flags holds the information about checkpoints > currently running, so we could have the WAL senders look at this data > and prevent any commands generating WAL. The checkpointer may be > already stopped at the moment the WAL senders finish their loop, so we > need also to check if the checkpointer is running or not on those code > paths. Such safeguards may actually be enough for the problem of this > thread. Thoughts? > Hmm but how do we handle statements that are already in progress by the time ckpt_flags changes? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: