Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
Дата	19 декабря 2011 г. 17:25:29
Msg-id	4EEFABB2.5050100@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
Список	pgsql-bugs

Дерево обсуждения

On 19.12.2011 22:12, Tom Lane wrote:
> Noah Misch<noah@leadboat.com>  writes:
>> On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote:
>>> That is not sufficient on platforms with a weak memory model, like Itanium.
>
>> Other processors may observe the lock as held after its release, but there's no
>> correctness problem.
>
> How weak is the memory model, exactly?
>
> A correctness problem would ensue if out-of-order stores are possible,
> ie other processors could observe the lock being freed (and then acquire
> it) before seeing changes to shared variables that had been made while
> holding the lock.
>
> I'm a little dubious that this applies to Itanium, because I don't see
> any memory fence instruction in the TAS macro.  If we needed one in
> S_UNLOCK I'd rather expect there to be one in TAS as well.

There's a pretty good manual called "Implementing Spinlocks on Itanium
and PA-RISC" from HP at:
http://h21007.www2.hp.com/portal/download/files/unprot/itanium/spinlocks.pdf.
It says, in section "3.2.1 Try to get a spinlock":

 > Note also, that the ‘xchg’ instruction always
 > has the ‘acquire’ semantics, so it enforces the correct memory ordering.

But the current implementation seems to be safe, anyway.  In section
"3.2.3 Freeing a spinlock", that manual says:

 > Note, that a simple assignment statement into a volatile variable
will not work, as we are
 > assuming that the +Ovolatile=__unordered compile option is being used.

So +Ovolatile=__unordered would allow the kind of optimization that I
thought was possible, and we would have a problem if we used it. But the
default is more conservative, and a simple assignment does work.

I compiled the attached test program on an HP Itanium box, using the
same flags you get from PostgreSQL's configure on that box. The relevant
assembler output is:

         xchg4           r14 = [r15], r14           // M [slocktest.c: 66/3]
//file/line/col slocktest.c/67/3
         ld1.acq         r16 = [r11]                // M [slocktest.c: 67/3]
         nop.i           0                          // I
//file/line/col slocktest.c/68/3
         st1.rel         [r11] = r10             ;; // M [slocktest.c: 68/3]
//file/line/col slocktest.c/69/3
         st4.rel         [r15] = r0                 // M [slocktest.c: 69/3]
//file/line/col slocktest.c/70/1

The trick I missed is that the compiler attaches .rel to all the stores
and .acq to all the loads through a volatile pointer. gcc seems to do
the same. So we're safe.

It would be interesting to test whether using +Ovolatile=__unordered
would make a difference to performance. Tacking those memory fence
modifiers to all the volatile loads and stores might be expensive.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Вложения

slocktest.c

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]

Вложения