Re: Windows crash / abort handling
От | Andres Freund |
---|---|
Тема | Re: Windows crash / abort handling |
Дата | |
Msg-id | 20220110005704.es4el6i2nxlxzwof@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Windows crash / abort handling (Craig Ringer <craig.ringer@enterprisedb.com>) |
Ответы |
Re: Windows crash / abort handling
|
Список | pgsql-hackers |
Hi, On 2021-10-06 14:11:51 +0800, Craig Ringer wrote: > On Wed, 6 Oct 2021, 03:30 Andres Freund, <andres@anarazel.de> wrote: > > > Hi, > > > > > > My first attempt was to try to use the existing crashdump stuff in > > pgwin32_install_crashdump_handler(). That's not really quite what I want, > > because it only handles postmaster rather than any binary, but I thought > > it'd > > be a good start. But outside of toy situations it didn't work for me. > > > > Odd. It usually has for me, and definitely not limited to the postmaster. > But it will fall down for OOM, smashed stack, and other states where > in-process self-debugging is likely to fail. I think it's a question of running debug vs optimized builds. At least in debug builds it doesn't appear to work because the debug c runtime abort preempts it. > I patch this out when working on windows because it's a real pain. > > I keep meaning to propose that we remove this functionality entirely. It's > obsolete. It was introduced back in the days where DrWatson.exe "windows > error reporting") used to launch an interactive prompt asking the user what > to do when a process crashed. This would block the crashed process from > exiting, making everything grind to a halt until the user interacted with > the > UI. Even for a service process. > Not fun on a headless or remote server. Yea, the way we do it right now is definitely not helpful. Especially because it doesn't actually prevent the "hang" issue - the CRT boxes at least cause precisely such stalls. We've had a few CI hangs due to such errors. > These days Windows handles all this a lot more sensibly, and blocking crash > reporting is quite obsolete and unhelpful. From what I've seen it didn't actually get particularly sensible, just different and more complicated. From what I've seen one needs at least: - _set_abort_behavior(_CALL_REPORTFAULT | _WRITE_ABORT_MSG) - _set_error_mode(_OUT_TO_STDERR) - _CrtSetReportMode(_CRT_ASSERT/ERROR, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG) - SetErrorMode(SEM_FAILCRITICALERRORS) There's many things this hocuspocus can be called, but sensible isn't among my word choices for it. > I'd like to just remove it. I think we need to remove the SEM_NOGPFAULTERRORBOX, but I don't think we can remove the SEM_FAILCRITICALERRORS, and I think we need the rest of the above to prevent error boxes from happening. I think we ought to actually apply these to all binaries, not just postgres. One CI hung was due to psql asserting. But there's currently no easy hook point for all binaries afaict. If we were to introduce something it should probably be in pgport? But I'd defer to that to a later step. I've attached a patch implementing these changes. I also made one run with that intentionally fail (with an Assert(false)), and with the changes the debugger is invoked and creates a backtrace etc: https://cirrus-ci.com/task/5447300246929408 (click on crashlog-> at the top) Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: