Good day, hackers.
13.05.2025 19:22, Yura Sokolov wrote:
> Good day.
>
> During conversation about other patch, Andres pointed out SpinLockAcquire
> have semantic of full memory barrier [1] .
>
> spin.h also states:
>
>> Load and stores operation in calling code are guaranteed not to be
> reordered with respect to these operations, because they include a compiler
> barrier.
>
> But on ARM/ARM64 platform tas in s_lock.h is implemented as simple call to
> __sync_lock_test_and_set, and . And GCC's documentation states it has only
> Acquire semantic, and __sync_lock_release (used as implementation of
> SpinLockRelease) has only Release semantic [2].
>
> Compiling and disassembling postgresql with -O2 optimization level on arm64
> we can see that SpinLockAquire is compiled down to call to
> __aarch64_swp4_sync (in optimistic case) which has disassemble:
>
> 0000000000662cc0 <__aarch64_swp4_sync>:
> 662cc0: d503245f bti c
> 662cc4: 90001bf0 adrp x16, 9de000 <hist_start+0x3d18>
> 662cc8: 39582210 ldrb w16, [x16, #1544]
> 662ccc: 34000070 cbz w16, 662cd8
> 662cd0: b8a08020 swpa w0, w0, [x1]
> 662cd4: d65f03c0 ret
> 662cd8: 2a0003f0 mov w16, w0
> 662cdc: 885f7c20 ldxr w0, [x1]
> 662ce0: 88117c30 stxr w17, w16, [x1]
> 662ce4: 35ffffd1 cbnz w17, 662cdc
> 662ce8: d5033bbf dmb ish
> 662cec: d65f03c0 ret
>
> Here we see 'swpa' instruction which has only acquire semantic [3].
>
> And SpinLockRelease generates 'stlr' instruction, which has release
> semantic, iirc.
>
> Godbolt shows noticable difference between __sync_lock_test_and_set vs
> __atomic_exchange_n(ATOMIC_SEQ_CST) (and __sync_lock_release vs
> __atomic_store_n(ATOMIC_SEQ_CST)) using GCC even with march=armv7 [4]:
> - __atomic_exchange_n and __atomic_store_n has 'dmb ish' instruction both
> before and after load/store operations,
> - but __sync_lock_test_and_set has 'dmb ish' only after load/store
> operations and __sync_lock_release only before store operation. (You can
> see same pattern in __aarch64_swp4_sync above in case 'swpa' instruction is
> not present).
>
> Therefore I think neither SpinLockAcquire nor SpinLockRelease have no full
> memory barrier semantic on ARM/ARM64 at the moment when compiled with GCC.
>
> Surprisingly, Clang puts 'dmb ish' both before and after load/store at
> __sync_lock_test_and_set, but does not the same for __sync_lock_release.
>
> As a simple fix I suggest to put __sync_synchronize before
> __sync_lock_test_and_set and after __sync_lock_release.
>
> Initially I wanted to use __atomic_exchange_n and __atomic_store_n. But in
> absence of support for atomic instructions, their inner loop doesn't look
> to be safe for reordering since it uses ldar+stlr loop:
> - looks like previous operations could be reordered after 'ldar'.
> - similarly following operations could be reordered before 'stlr'.
> There's very relevant discussion at GCC's bugzilla [5].
> Probably this hypothesis is not valid. I believe we have to ask someone
> closely familiar with ARM internals to be sure.
>
> # Alternative.
>
> As an alternative, we may fix comments about
> SpinLockAcquire/SpinLockRelease to state they provide acquire and release
> barrier semantics only. And then check all their uses and set appropriate
> memory barriers where their usage as full memory barriers is detected (as
> in [1]).
>
> [1]
> https://postgr.es/m/5ccgykypol3azijw2chqnpg3rhuwjtwsmbfs3pgcqm7fu6laus%40wppo6zcfszay
> [2]
>
https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/_005f_005fsync-Builtins.html#index-_005f_005fsync_005flock_005ftest_005fand_005fset
> [3]
>
https://developer.arm.com/documentation/100069/0606/Data-Transfer-Instructions/SWPA--SWPAL--SWP--SWPL--SWPAL--SWP--SWPL
> [4] https://godbolt.org/z/h5P9fjzWd
> [5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
I've recognized my mistake about SpinLockAcquire/SpinLockRelease
requirements. In [1] I've describe my thoughts and suggested to introduce
read-write spin lock for the cases when there are a lot of readers and few
(or one) writers.
[1] https://postgr.es/m/450dafae-b620-4726-abc2-53347c419394%40postgrespro.ru
--
regards
Yura Sokolov aka funny-falcon