On Mon, Sep 29, 2014 at 3:37 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Yea :(. Note how signals are blocked in all the signal handlers and only
> unblocked for a very short time (the sleep).
>
> (stare at random shit for far too long)
>
> Ah. DetermineSleepTime(), which is called while signals are unblocked!,
> modifies BackgroundWorkerList. Previously that only iterated the list,
> without modifying it. That's already of quite debatable safety, but
> modifying it without having blocked signals is most definitely
> broken. The modification was introduced by 7f7485a0c...
Ouch. OK, yeah, that's a bug.
> If you can manually run stuff on that machine, it'd be rather helpful if
> you could put a PG_SETMASK(&BlockSig);...PG_SETMASK(&UnBlockSig); in the
> HaveCrashedWorker() loop.
I'd do it the other way around, and adjust ServerLoop to put the
PG_SETMASK calls right around pg_usleep() and select(). But why futz
with anole? Let's just check in the fix. It'll either fix anole or
not, but we should fix the bug you found either way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company