Re: xlc atomics
От | Noah Misch |
---|---|
Тема | Re: xlc atomics |
Дата | |
Msg-id | 20160215171129.GA347322@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: xlc atomics (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: xlc atomics
Re: xlc atomics |
Список | pgsql-hackers |
On Sat, Jul 04, 2015 at 08:07:49PM -0400, Noah Misch wrote: > On Sun, Jul 05, 2015 at 12:54:43AM +0200, Andres Freund wrote: > > On 2015-07-04 18:40:41 -0400, Noah Misch wrote: > > > (1) "IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)". Getting it working > > > required the attached patch. > > > Will you apply? Having the ability to test change seems to put you in a > > much better spot then me. > > I will. That commit (0d32d2e) permitted things to compile and usually pass tests, but I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen sixteen duplicate-catalog-OID failures. The compiler has always been xlC, and the branch has been HEAD or 9.5. Examples: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-15%2004%3A12%3A49 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-08%2001%3A20%3A16 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2015-12-11%2005%3A25%3A54 These suggested OidGenLock wasn't doing its job. I've seen similar symptoms around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73, 5765-J08)" for ppc64le. The problem is generic-xlc.h pg_atomic_compare_exchange_u32_impl() issuing __isync() before __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our own s_lock.h, its references, and other projects' usage: https://github.com/boostorg/smart_ptr/blob/boost-1.60.0/include/boost/smart_ptr/detail/sp_counted_base_vacpp_ppc.hpp#L55 http://redmine.gromacs.org/projects/gromacs/repository/entry/src/external/thread_mpi/include/thread_mpi/atomic/xlc_ppc.h?rev=v5.1.2#L112 https://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/asm_example.html This patch's test case would have failed about half the time under today's generic-xlc.h. Fast machines run it in about 1s. A test that detects the bug 99% of the time would run far longer, hence this compromise. Archaeology enthusiasts may enjoy the bug's changing presentation over time. LWLockAcquire() exposed it after commit ab5194e redesigned lwlock.c in terms of atomics. Assert builds were unaffected for commits in [ab5194e, bbfd7ed); atomics.h asserts fattened code enough to mask the bug. However, removing the AssertPointerAlignment() from pg_atomic_fetch_or_u32() would have sufficed to surface it in assert builds. All builds demonstrated the bug once bbfd7ed introduced xlC pg_attribute_noreturn(), which elicits a simpler instruction sequence around the ExceptionalCondition() call embedded in each Assert. nm
Вложения
В списке pgsql-hackers по дате отправления: