Re: xlc atomics
От | Noah Misch |
---|---|
Тема | Re: xlc atomics |
Дата | |
Msg-id | 20160424015407.GD2084318@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: xlc atomics (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: xlc atomics
|
Список | pgsql-hackers |
On Mon, Feb 15, 2016 at 02:02:41PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > That commit (0d32d2e) permitted things to compile and usually pass tests, but > > I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen > > sixteen duplicate-catalog-OID failures. > > I'd been wondering about those ... > > > These suggested OidGenLock wasn't doing its job. I've seen similar symptoms > > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73, > > 5765-J08)" for ppc64le. The problem is generic-xlc.h > > pg_atomic_compare_exchange_u32_impl() issuing __isync() before > > __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our > > own s_lock.h, its references, and other projects' usage: > > Nice catch! > > > This patch's test case would have failed about half the time under today's > > generic-xlc.h. Fast machines run it in about 1s. A test that detects the bug > > 99% of the time would run far longer, hence this compromise. > > Sounds like a reasonable compromise to me, although I wonder about the > value of it if we stick it into pgbench's TAP tests. How many of the > slower buildfarm members are running the TAP tests? Certainly mine are > not. I missed a second synchronization bug in generic-xlc.h, but the pgbench test suite caught it promptly after commit 008608b. Buildfarm members mandrill and hornet failed[1] or deadlocked within two runs. The bug is that the pg_atomic_compare_exchange_*() specifications grant "full barrier semantics", but generic-xlc.h provided only the semantics of an acquire barrier. Commit 008608b exposed that older problem; its LWLockWaitListUnlock() relies on pg_atomic_compare_exchange_u32() (via pg_atomic_fetch_and_u32()) for release barrier semantics. The fix is to issue "sync" before each compare-and-swap, in addition to the "isync" after it. This is consistent with the corresponding GCC atomics. The pgbench test suite survived dozens of runs so patched. [1] http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-04-11%2003%3A14%3A13
Вложения
В списке pgsql-hackers по дате отправления: