Re: stats collector dies in current
От | Tom Lane |
---|---|
Тема | Re: stats collector dies in current |
Дата | |
Msg-id | 19363.1092543548@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: stats collector dies in current (Jan Wieck <JanWieck@Yahoo.com>) |
Ответы |
Re: stats collector dies in current
|
Список | pgsql-hackers |
Jan Wieck <JanWieck@Yahoo.com> writes: > In that context, is SIGTSTP similar to SIGSTOP in that it cannot be > caught or ignored? Possibly. I've reproduced the problem here on an RHL 8 system (2.4.18 kernel) and I think it's a kernel bug. Points: 1. AFAICS, the only case where the stats buffer process will exit(1) without logging a prior message is where it's gotten SIGCHLD. So, hypothesis: it is the collector process (grandchild process) that is dying. 2. Experiment one: try to strace the collector process to see what it's doing. Result: failure goes away!!! 3. Experiment two: try to strace the buffer process. Result: indeed it's getting SIGCHLD (in fact it seems to get it before SIGTSTP arrives). So at the very least we've got a Heisenbug, but my opinion is we are seeing broken kernel behavior. The only difference in signal handling that I can see from 7.4 is that the collector process explicitly executes pqsignal calls to re-establish all the signal handlers it should have inherited from its parent. I suspect (but haven't tested) that removing that supposedly redundant code would make the failure go away again. The handler re-establishment was put in because it is needed for the EXEC_BACKEND case, but possibly we could make it #ifndef EXEC_BACKEND to work around this problem. regards, tom lane
В списке pgsql-hackers по дате отправления: