Re: [BUGS] Backend crash on non-exclusive backup cancel
От | Michael Paquier |
---|---|
Тема | Re: [BUGS] Backend crash on non-exclusive backup cancel |
Дата | |
Msg-id | CAB7nPqRCyoJTbgaLs9sH5Ehf0thHZp-KY8hJY6mCkaXAMW0i6w@mail.gmail.com обсуждение исходный текст |
Ответ на | [BUGS] Backend crash on non-exclusive backup cancel (David Steele <david@pgmasters.net>) |
Ответы |
Re: [BUGS] Backend crash on non-exclusive backup cancel
|
Список | pgsql-bugs |
On Tue, Feb 28, 2017 at 10:33 AM, David Steele <david@pgmasters.net> wrote: > This condition should throw "backup is not in progress" just as a > exclusive backup would, whether assertions are enabled or not. > > I believe the solution is to move the exclusive flag to xlog.c and only > decrement XLogCtl->Insert.nonExclusiveBackups when exclusive is true, > otherwise return an error. Even then, it wouldn't be clear if the > backup had completed or not. I understand by this sentence that you mean nonexclusive_backup_running, and I think that I agree with that. We need a way to reset it properly the session-level switch in case of an interrupt found, and that needs visibly to happen in pg_stop_backup_callback and pg_start_backup_callback(). At the same time I think that exclusive_backup_running had better be moved to xlog.c as well. If I look at the failure in details when issuing a cancel, I can see that XLogCtl->Insert.nonExclusiveBackups gets decremented at the end do_pg_stop_backup, but nonexclusive_backup_running never gets set back to false because of the query cancellation. > I suppose any cancelled non-exclusive > pg_stop_backup() should be considered aborted whether a stop backup > record was written or not? That's not necessarily true, I can see a stop backup able to finish as well by issuing a cancel request. It seems to me that we just need to have the shmem information updated at the same time as the session-level switches for consistency and we're good. The inconsistency in places when updating the session-level flags and the shmem-level flags is what is causing harm. > If that makes sense I'm happy to work up a patch. This is definitely an > edge case and I seriously doubt it is causing any issues in the field. Well, at least it is nothing caused directly by 974ece5, I am able to crash the server with or without that... There is actually some code that I know of that can issue a cancel request on pg_stop_backend(), this does not have assertions of course but it may become an issue. That's pretty close to the improvements I did recently for lock handling of exclusive backups, so there's nothing really new for me :) David, are you willing to write a patch or should I? Changing nonexclusive_backup_running/exclusive_backup_running to be an enum would make the code more readable as well IMO. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления: