Re: PostgreSQL crashes with Qmail-SQL
От | Tom Lane |
---|---|
Тема | Re: PostgreSQL crashes with Qmail-SQL |
Дата | |
Msg-id | 21819.1011923406@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: PostgreSQL crashes with Qmail-SQL (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
I said: > That still leaves us with all the defunct postmaster children to explain > though. Hmm. I wonder exactly what the postmaster does when someone > forcibly removes its socket file... probably system-dependent, but I > could certainly believe getting into a busy-wait loop of select/accept. > That doesn't look like it should prevent SIGCHLD from getting noticed, > though. On Linux (at least RH 7.2), the answer to what happens when the socket file is removed is: nothing. Clients can't connect anymore, but the postmaster gets no error indicating that anything is wrong. So it sits. And that means that the 7.1-to-7.2 change I mentioned before is relevant. In 7.1, the SIGCHLD signal handler blocked signals at its beginning, and didn't think to unblock them on exit. So after servicing one SIGCHLD interrupt, the postmaster would end up sitting at its select() with signals blocked. Further SIGCHLDs would not get serviced until the next spin around the outer loop re-enabled interrupts. Normally, no big deal, but with no new connection requests coming in, the postmaster wouldn't ever get around to wait()ing for its last few children. (7.2 re-enables signals at exit from the handler, so I don't think it will show this problem; and indeed I don't see any zombies after "rm /tmp/.s.PGSQL.5432" during a run of Michael's benchmark script with 7.2. Not incidentally, I do observe a complete lack of any complaints out of the benchmark script; it keeps flailing along without any sign that all its database connection attempts are failing.) In short: all the reported facts can be explained by the theory that *something* removed the socket file during that long test run. regards, tom lane
В списке pgsql-hackers по дате отправления: