Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
От | Robert Haas |
---|---|
Тема | Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs |
Дата | |
Msg-id | CA+TgmobyqhvXtizQ7ugR7M27fdtto78LB4c-x407KVVH=SxJug@mail.gmail.com обсуждение исходный текст |
Ответ на | Yes, WaitLatch is vulnerable to weak-memory-ordering bugs (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs |
Список | pgsql-hackers |
On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Any protocol of that sort has *obviously* got a race condition between > changes of the latch state and changes of the separate flag state, if run > in a weak-memory-ordering machine. > > At least on the hardware I'm testing, it seems that the key requirement > for a failure to be seen is that there's a lot of contention for the cache > line containing the flag, but not for the cache line containing the latch. > This allows the write to the flag to take longer to propagate to main > memory than the write to the latch. This is interesting research, especially because it provides sobering evidence of how easily a real problem could be overlooked in testing. If it took several minutes for this to fail on an 8-CPU system, it's easy to imagine a less egregious example sitting in our code-base for years undetected. (Indeed, I think we probably have a few of those already...) On the flip side, I'm not sure that anyone ever really expected that a latch could be safely used this way. Normally, I'd expect the flag to be some piece of state protected by an LWLock, and I think that ought to be OK provided that the lwlock is released before setting or checking the latch. Maybe we should try to document the contract here a little better; I think it's just that there must be a full memory barrier (such as LWLockRelease) in both processes involved in the interaction. > I'm not sure whether we need to do anything about this for 9.1; it may be > that none of the existing uses of latches are subject to the kind of > contention that would create a risk. But I'm entirely convinced that we > need to insert some memory-barrier instructions into the latch code before > we expand our uses of latches much more. Since we were already finding > that memory barriers will be needed for performance reasons elsewhere, > I think it's time to bite the bullet and develop that infrastructure. I've been thinking about this too and actually went so far as to do some research and put together something that I hope covers most of the interesting cases. The attached patch is pretty much entirely untested, but reflects my present belief about how things ought to work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: