Обсуждение: Re: use a non-locking initial test in TAS_SPIN on AArch64

Поиск
Список
Период
Сортировка

Re: use a non-locking initial test in TAS_SPIN on AArch64

От
Jingtang Zhang
Дата:
Hi~

Upon closer inspection, I noticed that we don't implement a custom
TAS_SPIN() for this architecture, so I quickly hacked together the attached
patch and ran a couple of benchmarks that stressed the spinlock code.  I
found no discussion about TAS_SPIN() on ARM in the archives, but I did
notice that the initial AArch64 support was added [0] before x86_64 started
using a non-locking test [1].
It reminds me of a discussion about improving spinlock performance on ARM
in 2020 [0], though the discussion is about CAS and TAS, not TAS_SPIN() itself.
 
    tps = 74135.100891 (without initial connection time)
    tps = 549462.785554 (without initial connection time)
The result looks great, but the discussion in [0] shows that the result may
vary among different ARM chips. Could you provide the chip model of this
test? So that we can do a cross validation of this patch. Not sure if compiler
version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710
if I have time.


Re: use a non-locking initial test in TAS_SPIN on AArch64

От
Nathan Bossart
Дата:
On Wed, Oct 23, 2024 at 11:01:05AM +0800, Jingtang Zhang wrote:
> The result looks great, but the discussion in [0] shows that the result may
> vary among different ARM chips. Could you provide the chip model of this
> test? So that we can do a cross validation of this patch.

This is on a c8g.24xlarge, which is using Neoverse-V2 and Armv9.0-a [0].

> I'm willing to test it on Alibaba Cloud Yitian 710 if I have time.

That would be great.  I have a couple of Apple M-series machines I can
test, too.

[0] https://github.com/aws/aws-graviton-getting-started/blob/main/README.md#building-for-graviton
-- 
nathan



Re: use a non-locking initial test in TAS_SPIN on AArch64

От
Nathan Bossart
Дата:
On Wed, Oct 23, 2024 at 09:46:56AM -0500, Nathan Bossart wrote:
> I have a couple of Apple M-series machines I can test, too.

After some preliminary tests on an M3, I'm not seeing any gains outside the
noise range.  That's not too surprising because it's likely more difficult
to create a lot of spinlock contention on these smaller machines.  But, at
the very least, I'm not seeing a regression.

-- 
nathan



Re: use a non-locking initial test in TAS_SPIN on AArch64

От
Jingtang Zhang
Дата:
Hi, Nathan.

I just realized that I almost forgot about this thread :)

> The result looks great, but the discussion in [0] shows that the result may
> vary among different ARM chips. Could you provide the chip model of this
> test? So that we can do a cross validation of this patch. Not sure if compiler
> version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710
> if I have time.

I did some benchmark on Yitian 710.

On c8y.16xlarge (64 cores):

Without the patch:
  80.31%  postgres               [.] __aarch64_swp4_acq
   1.77%  postgres               [.] __aarch64_ldadd4_acq_rel
   1.13%  postgres               [.] hash_search_with_hash_value
   0.87%  pg_stat_statements.so  [.] __aarch64_swp4_acq
   0.72%  postgres               [.] perform_spin_delay
   0.44%  postgres               [.] _bt_compare

tps = 295272.628421 (including connections establishing)
tps = 295335.660323 (excluding connections establishing)

Patched:
   9.94%  postgres               [.] s_lock
   6.07%  postgres               [.] __aarch64_swp4_acq
   5.73%  postgres               [.] hash_search_with_hash_value
   2.81%  postgres               [.] perform_spin_delay
   2.29%  postgres               [.] _bt_compare
   2.15%  postgres               [.] PinBuffer

tps = 864519.764125 (including connections establishing)
tps = 864638.244443 (excluding connections establishing)


Seems that great performance could be gained if s_lock contention is severe.
This may be more likely to happen on bigger machines.

On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and
as a result this patch didn’t bring any difference outside the noise.


Regards,
Jingtang





Re: use a non-locking initial test in TAS_SPIN on AArch64

От
Nathan Bossart
Дата:
On Wed, Jan 15, 2025 at 07:50:38PM +0800, Jingtang Zhang wrote:
> Seems that great performance could be gained if s_lock contention is severe.
> This may be more likely to happen on bigger machines.
> 
> On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and
> as a result this patch didn´t bring any difference outside the noise.

Thanks for sharing.

-- 
nathan