Обсуждение: shm_mq fix for non-blocking mode
The shm_mq code handles blocking mode and non-blocking mode asymmetrically in a couple of places, with the unfortunate result that if you are using non-blocking mode, and your counterparty dies before attaching the queue, operations on the queue continue to return SHM_MQ_WOULD_BLOCK instead of, as they should, returning SHM_MQ_DETACHED. The attached patch fixes the problem. Thanks to my colleague Rushabh Lathia for helping track this down. (There's are some further bugs in this area outside the shm_mq code ... but I'm still trying to figure out exactly what they are and what we should do about them. This much, however, seems clear-cut.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Fri, Oct 16, 2015 at 5:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The shm_mq code handles blocking mode and non-blocking mode > asymmetrically in a couple of places, with the unfortunate result that > if you are using non-blocking mode, and your counterparty dies before > attaching the queue, operations on the queue continue to return > SHM_MQ_WOULD_BLOCK instead of, as they should, returning > SHM_MQ_DETACHED. The attached patch fixes the problem. Thanks to my > colleague Rushabh Lathia for helping track this down. > > (There's are some further bugs in this area outside the shm_mq code > ... but I'm still trying to figure out exactly what they are and what > we should do about them. This much, however, seems clear-cut.) ...and so I've committed it and back-patched to 9.4. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 22, 2015 at 4:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Oct 16, 2015 at 5:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> The shm_mq code handles blocking mode and non-blocking mode >> asymmetrically in a couple of places, with the unfortunate result that >> if you are using non-blocking mode, and your counterparty dies before >> attaching the queue, operations on the queue continue to return >> SHM_MQ_WOULD_BLOCK instead of, as they should, returning >> SHM_MQ_DETACHED. The attached patch fixes the problem. Thanks to my >> colleague Rushabh Lathia for helping track this down. >> >> (There's are some further bugs in this area outside the shm_mq code >> ... but I'm still trying to figure out exactly what they are and what >> we should do about them. This much, however, seems clear-cut.) > > ...and so I've committed it and back-patched to 9.4. Sigh. This was buggy; I have no idea how it survived my earlier testing. I will go fix it. Sorry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 22, 2015 at 10:00 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> ...and so I've committed it and back-patched to 9.4. > > Sigh. This was buggy; I have no idea how it survived my earlier testing. > > I will go fix it. Sorry. Gah! That, too, turned out to be buggy, although in a considerably more subtle way. I've pushed another fix with a detailed comment and an explanatory commit message that hopefully squashes this problem for good. Combined with the fix at http://www.postgresql.org/message-id/CA+TgmoZzv3u9trsvcAO+-OtXbsz_u+A5Q8X-_B+VZceHhtzTmA@mail.gmail.com this seems to squash occasional complaints about workers "dying unexpectedly" when they really had done no such thing. The test code I used to find these problems is attached. I compiled and installed the parallel_dummy extension, did pgbench -i -s 100, and then ran this: while psql -c "select parallel_count('pgbench_accounts', 4)"; do sleep 1; done Without these fixes, this can hang or error out, but with these fixes, it works fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company