Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
От | Heikki Linnakangas |
---|---|
Тема | Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] |
Дата | |
Msg-id | 4EEFABB2.5050100@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Re: [PATCH] Use CC atomic builtins if available [was:
Re: TAS patch for building on armel/armhf thumb]
Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] |
Список | pgsql-bugs |
On 19.12.2011 22:12, Tom Lane wrote: > Noah Misch<noah@leadboat.com> writes: >> On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote: >>> That is not sufficient on platforms with a weak memory model, like Itanium. > >> Other processors may observe the lock as held after its release, but there's no >> correctness problem. > > How weak is the memory model, exactly? > > A correctness problem would ensue if out-of-order stores are possible, > ie other processors could observe the lock being freed (and then acquire > it) before seeing changes to shared variables that had been made while > holding the lock. > > I'm a little dubious that this applies to Itanium, because I don't see > any memory fence instruction in the TAS macro. If we needed one in > S_UNLOCK I'd rather expect there to be one in TAS as well. There's a pretty good manual called "Implementing Spinlocks on Itanium and PA-RISC" from HP at: http://h21007.www2.hp.com/portal/download/files/unprot/itanium/spinlocks.pdf. It says, in section "3.2.1 Try to get a spinlock": > Note also, that the ‘xchg’ instruction always > has the ‘acquire’ semantics, so it enforces the correct memory ordering. But the current implementation seems to be safe, anyway. In section "3.2.3 Freeing a spinlock", that manual says: > Note, that a simple assignment statement into a volatile variable will not work, as we are > assuming that the +Ovolatile=__unordered compile option is being used. So +Ovolatile=__unordered would allow the kind of optimization that I thought was possible, and we would have a problem if we used it. But the default is more conservative, and a simple assignment does work. I compiled the attached test program on an HP Itanium box, using the same flags you get from PostgreSQL's configure on that box. The relevant assembler output is: xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3] //file/line/col slocktest.c/67/3 ld1.acq r16 = [r11] // M [slocktest.c: 67/3] nop.i 0 // I //file/line/col slocktest.c/68/3 st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3] //file/line/col slocktest.c/69/3 st4.rel [r15] = r0 // M [slocktest.c: 69/3] //file/line/col slocktest.c/70/1 The trick I missed is that the compiler attaches .rel to all the stores and .acq to all the loads through a volatile pointer. gcc seems to do the same. So we're safe. It would be interesting to test whether using +Ovolatile=__unordered would make a difference to performance. Tacking those memory fence modifiers to all the volatile loads and stores might be expensive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-bugs по дате отправления: