Re: problems on Solaris
От | Robert Haas |
---|---|
Тема | Re: problems on Solaris |
Дата | |
Msg-id | CA+TgmoZFzmBmGFORw9y2kaAbCqHybTSsW=H97k0i=Asud4UjnA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: problems on Solaris (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: problems on Solaris
|
Список | pgsql-hackers |
On Wed, May 27, 2015 at 6:55 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-05-27 15:39:14 -0400, Robert Haas wrote: >> On Mon, May 25, 2015 at 10:05 PM, Andres Freund <andres@anarazel.de> wrote: >> > Hm. So we have a *occasional* stack size exceeded failure and an >> > occasional spinlock error in test_shm_mq. I'm inclined to think that >> > this is a shm_mq problem, and not a more general locking problem - it >> > seems likely, but not guaranteed, that that'd have materialized >> > elsewhere. >> >> I think the problem might be that the spinlock-based memory barrier is >> not re-entrant. Suppose some kind of barrier operation is in process, >> and we've acquired the dummy spnlock but not yet released it. Just >> then, we receive a signal. Since the shm_mq code sets >> set_latch_on_sigusr1, procsignal_sigusr1_handler will set MyLatch. >> SetLatch now includes barrier operations, so we'll try to acquire and >> release the spinlock despite already holding it. Oops. > > Oh wow, that's bad, and could explain a couple of the problems we're > seing. One possible way to fix is to replace the sequence with if > (!TAS(spin)) S_UNLOCK();. But that'd mean TAS() has to be a barrier, > even if the lock isn't free - which e.g. isn't the case for PowerPC's > implementation :( Another possibility is to make the fallback barrier implementation a system call, like maybe kill(PostmasterPid, 0). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: