Re: Postgres 7.4.7 hang in async_notify
От | Tom Lane |
---|---|
Тема | Re: Postgres 7.4.7 hang in async_notify |
Дата | |
Msg-id | 26109.1117736485@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Postgres 7.4.7 hang in async_notify (pgsql-bugs@counterstorm.com) |
Список | pgsql-bugs |
pgsql-bugs@counterstorm.com writes: > We saw the problem with async_notify again (See thread with subject > "Postgres 7.4.6 hang in async_notify" in first message to this list > dated "Mon, 25 Apr 2005 15:42:35 -0400") in a production setting. > Since our last instance, we converted to compiling postgres with > debugging, so we have a stack trace. Looking at it, the problem > appears at first blush like it might be pretty obvious: an ill-timed > signal which arrives during a malloc while malloc has some > data-structure locked, and one of the extensive operations that > Async_NotifyHandler did probably involved getting the same lock. So it would seem. The Async_NotifyHandler mechanism was designed at a time when ReadCommand didn't call anything of interest except read(), and so the assumption is that it's OK for PostgresMain to do this (oversimplified a bit): EnableNotifyInterrupt(); firstchar = ReadCommand(&input_message); DisableNotifyInterrupt(); Clearly, if SSL is going to be messing about with malloc() then this assumption is no longer safe. Looking at the code, I think we have introduced some other risks of the same ilk ourselves, but SSL is doubtless the largest variable. This probably explains a number of other irreproducible failures besides your hangup :-( I think we're going to have to push the enable/disable interrupt operations down closer to the actual read(). This doesn't seem to be any big deal for the non-SSL case, but it's not clear to me what we have to do to get between SSL and the socket. Anyone know offhand? > For the record, while this postgres should be (of two) generating > notifies out of triggers, we do not believe it should be listening for > any, and indeed examination of pg_listener suggests it does not. Doesn't matter --- 7.4 uses the same mechanism for SI messaging catchup interrupts. A backend that sits idle long enough *will* get one of these interrupts. Apparently you've managed to set up a situation where the client starts doing something after just-the-right-delay with better than nil probability. regards, tom lane
В списке pgsql-bugs по дате отправления: