Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea
От | Tom Lane |
---|---|
Тема | Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea |
Дата | |
Msg-id | 23807.979344113@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>) |
Список | pgsql-hackers |
>> I think we'd be lots better off to abandon the notion that we can exit >> directly from the SIGTERM interrupt handler, and instead treat SIGTERM >> the same way we treat QueryCancel: set a flag that is inspected at >> specific places where we know we are in a good state. >> >> Comments? > This will be much cleaner. OK, here's a more detailed proposal. We'll eliminate elog() directly from the signal handlers, except in the extremely limited case where the main line is waiting for a lock (the existing "cancel wait for lock" mechanism for QueryCancel can be used for SIGTERM as well, I think). Otherwise, both die() and QueryCancelHandler() will just set flags and return. handle_warn() (the SIGQUIT handler) should probably go away entirely; it does nothing that's not done better by QueryCancel. I'm inclined to make SIGQUIT invoke die() the same as SIGTERM does, unless someone has a better idea what to do with that signal. I believe we should keep the "critical section count" mechanism that Vadim already created, even though it is no longer needed to discourage the signal handlers themselves from aborting processing. With the critical section counter, it is OK to have flag checks inside subroutines that are safe to abort from in some contexts and not others --- the contexts that don't want an abort just have to do START/END_CRIT_SECTION to ensure that QueryCancel/ProcDie won't happen in routines they call. So the basic flag checks are like "if (InterruptFlagSet && CritSectionCounter == 0) elog(...);". Having done that, the $64 question is where to test the flags. Obviously we can check for interrupts at all the places where QueryCancel is tested now, which is primarily the outer loops. I suggest we also check for interrupts in s_lock's wait loop (ie, we can cancel/die if we haven't yet got the lock AND we are not in a crit section), as well as in END_CRIT_SECTION. I intend to devise a macro CHECK_FOR_INTERRUPTS() that embodies all the test code, rather than duplicating these if-tests in many places. Note I am assuming that it's always reasonable to check for QueryCancel and ProcDie at the same places. I do not see any hole in that theory, but if someone finds one, we could introduce additional flags/counters to distinguish safe-to-cancel from safe-to-die states. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: