Re: [HACKERS] postmaster disappears
От | Tatsuo Ishii |
---|---|
Тема | Re: [HACKERS] postmaster disappears |
Дата | |
Msg-id | 199909220449.NAA26668@srapc451.sra.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] postmaster disappears (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] postmaster disappears
|
Список | pgsql-hackers |
>> Not sure. reaper() may be called while reaper() is executing if a new >> SIGCHLD is raised. How do you handle this case? > >No, because the signal is disabled when the trap is taken, and then not >re-enabled until reaper() does pqsignal() just before exiting. We don't You are correct. I had wrong impression about signal handling. >>> Moreover, you're not actually checking what the select() did unless >>> you do it that way. > >> Sorry, I don't understand this. Can you explain, please? > >If you don't have the signal routine save/restore errno, then (when this >problem occurs) you are not seeing the errno returned by the select(), >but one left over from reaper()'s activity. If the select() failed, you >won't know it. Oh, I see your point. >>> Curious that this sort of problem is not seen more often --- I wonder >>> if most Unixes arrange to save/restore errno around a signal handler >>> for you? > >> Maybe because the situation I have pointed out is relatively rare. > >Well, the window for trouble is awfully tiny in this particular code of >ours, but it might be larger in other programs. Though it seems rare, we certainly have had this kind of reports from users for a while. Since disappearing postmaster is a really bad thing, I love to see solutions for this. >Yet I don't think I've >ever heard a programming recommendation to save/restore errno in signal >handlers... Agreed. I don't like this way. I asked a Unix guru, and got a suggestion that we do not need to call wait() (and CleanupProc()) inside the signal handler. Instead we could have a null signal hander (it just calls pqsignal()) for SIGCHLD. If select() returns EINTR then we just call wait() and CleanupProc(). Moreover this would eliminate sigprocmask() or sigblock() calls currently done to avoid race conditions before going into the critical region. Of course we have to call wait() and CleanupProc() before select() to make sure that we have no waiting children. Another way would be blocking SIGCHILD before calling select(). In this case appropriate time out setting for select() is necessary, though. -- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: