Обсуждение: [V2] Adding new CRC32C implementation for IBM S390X

Поиск
Список
Период
Сортировка

[V2] Adding new CRC32C implementation for IBM S390X

От
Eduard Stefes
Дата:
Hi,

So here is V2 of the crc32c_s390x patch. Changes from V1 are:

- added gcc 14-14.2 as known broken compiler (bug was fixed with 14.3)
- moved broken compiler check to vx extension compile&link check
- made variables global in the extension check
- create dependency to getauxval in configure, so we don't compile the
code if we won't be able to detect the cpu extension at runtime
- moved buffer length check into macro
- changed minimal buffer length for crc32c_s390x from 64 to 16 byte
- added CFLAGS_CRC to all crc32c_s390x* artifacts
- fixed formatting with pgindent 
- fixed typos in email address


-- 
Eduard Stefes <eduard.stefes@ibm.com>

Вложения

Re: [V2] Adding new CRC32C implementation for IBM S390X

От
John Naylor
Дата:
On Thu, Jun 5, 2025 at 2:15 PM Eduard Stefes <Eduard.Stefes@ibm.com> wrote:
>
> Hi,
>
> So here is V2 of the crc32c_s390x patch. Changes from V1 are:

Hi Eduard, thanks for the update. Note, it's not necessary to change
the thread subject, and in fact I came very close to missing this
email entirely.

Secondly, I haven't seen a response about the copyright. I'm referring
to this part in particular:

+ * Copyright (c) 2025, International Business Machines (IBM)

I shared this link in my first reply:

https://wiki.postgresql.org/wiki/Developer_FAQ#Do_I_need_to_sign_a_copyright_assignment.3F

It says, in part:

"May I add my own copyright notice where appropriate?

No, please don't. We like to keep the legal information short and
crisp. Additionally, we've heard that could possibly pose problems for
corporate users."

> - added gcc 14-14.2 as known broken compiler (bug was fixed with 14.3)

Why do we need to mark this as broken? In my research, it caused
compiler errors with those versions -- that itself should cause
fallback to sb8.

On that note, I'm now wondering if clang compiles and links
successfully but the program is broken? Or does it fail to compile? If
the latter, we should treat them the same: there is no need for "known
broken versions" in the configure checks, as it's just a matter of
documentation.

> - create dependency to getauxval in configure, so we don't compile the
> code if we won't be able to detect the cpu extension at runtime

That's just unnecessary clutter, and we don't do that anywhere else.
We already have the HAVE_GETAUXVAL symbol to guard the runtime check.

As I alluded to before, I'm not in favor of having both direct-call
and runtime-check paths here. The reason x86 and Arm have it is
because their hardware support works on any length input. Is there any
reason to expect auxv.h to be unavailable?

Also, lately we have been moving away from having separate *choose.c
files, since they add complexity to the build system. I'd suggest
looking at src/port/pg_popcount_aarch64.c -- the file is compiled
unconditionally but inside it has the appropriate #ifdef's as well as
the choice function. See how it handles auxv.h as well.

> - moved buffer length check into macro
> - changed minimal buffer length for crc32c_s390x from 64 to 16 byte

+#define COMP_CRC32C(crc, data, len) \
+ ((crc) = (len) < 16 ? pg_comp_crc32c_sb8((crc),(data),(len)) :
pg_comp_crc32c((crc), (data), (len)))

Your tests demonstrated improvement with 32 bytes and above, and
nothing less than 31 makes sense as a minimum because of the 16-byte
alignment requirement. I've mentioned that we don't want the 20-byte
WAL header computation to have any more overhead than it does now.

--
John Naylor
Amazon Web Services



RE: [V2] Adding new CRC32C implementation for IBM S390X

От
Eduard Stefes
Дата:
Hi,


On Wed, 2025-06-11 at 13:48 +0700, John Naylor wrote:
> Hi Eduard, thanks for the update. Note, it's not necessary to change
> the thread subject, and in fact I came very close to missing this
> email entirely.

Sorry for the confusion. 


> Secondly, I haven't seen a response about the copyright. I'm
> referring
> to this part in particular:
> 
> + * Copyright (c) 2025, International Business Machines (IBM)
> 
> I shared this link in my first reply:
> 
> 
> It says, in part:
> 
> "May I add my own copyright notice where appropriate?
> 
> No, please don't. We like to keep the legal information short and
> crisp. Additionally, we've heard that could possibly pose problems
> for
> corporate users."
> 

I had to talk to some people about this and get their ok. I'll remove
the copyright text.


> > - added gcc 14-14.2 as known broken compiler (bug was fixed with
> > 14.3)
> 
> Why do we need to mark this as broken? In my research, it caused
> compiler errors with those versions -- that itself should cause
> fallback to sb8.
> 
> On that note, I'm now wondering if clang compiles and links
> successfully but the program is broken? Or does it fail to compile?
> If
> the latter, we should treat them the same: there is no need for
> "known
> broken versions" in the configure checks, as it's just a matter of
> documentation.
> 

Yes in all cases the compilation will fail and we will fall back to
sb8. However personally I think the user should get feedback of *why*
something fails. I'll remove the check and document the broken compiler
versions. 

> > - create dependency to getauxval in configure, so we don't compile
> > the
> > code if we won't be able to detect the cpu extension at runtime
> 
> That's just unnecessary clutter, and we don't do that anywhere else.
> We already have the HAVE_GETAUXVAL symbol to guard the runtime check.

Noted. I'll remove it.

> As I alluded to before, I'm not in favor of having both direct-call
> and runtime-check paths here. The reason x86 and Arm have it is
> because their hardware support works on any length input. Is there
> any
> reason to expect auxv.h to be unavailable?

I tried to find a reason but did not find any. So I'll remove it.

> Also, lately we have been moving away from having separate *choose.c
> files, since they add complexity to the build system. I'd suggest
> looking at src/port/pg_popcount_aarch64.c -- the file is compiled
> unconditionally but inside it has the appropriate #ifdef's as well as
> the choice function. See how it handles auxv.h as well.

I'll change that. 

> Your tests demonstrated improvement with 32 bytes and above, and
> nothing less than 31 makes sense as a minimum because of the 16-byte
> alignment requirement. I've mentioned that we don't want the 20-byte
> WAL header computation to have any more overhead than it does now.

Sorry, yes that's true I'll change that back. 


-- 
Eduard Stefes <eduard.stefes@ibm.com>




Re: [V2] Adding new CRC32C implementation for IBM S390X

От
Eduard Stefes
Дата:
Hi,

here is V3 of the patch. Changes from V2:

- removed IBM copyright
- removed GETAUXVAL check in favor of the already provided check
- moved runtime selection code from pg_crc32c_s390x_choose.c to
pg_crc32c_s390x.c and removed _choose.c file
- removed the config time compiler check and let the buildsystem fall
back to sb8
- changed buffer limit back to 32 bytes before calling s390x specific
implementation 


-- 
Eduard Stefes <eduard.stefes@ibm.com>






Вложения

Re: [V2] Adding new CRC32C implementation for IBM S390X

От
John Naylor
Дата:
On Tue, Jul 8, 2025 at 6:46 PM Eduard Stefes <Eduard.Stefes@ibm.com> wrote:
>
> Hi,
>
> here is V3 of the patch. Changes from V2:
>
> - removed IBM copyright
> - removed GETAUXVAL check in favor of the already provided check
> - moved runtime selection code from pg_crc32c_s390x_choose.c to
> pg_crc32c_s390x.c and removed _choose.c file
> - removed the config time compiler check and let the buildsystem fall
> back to sb8
> - changed buffer limit back to 32 bytes before calling s390x specific
> implementation

Hi Eduard, I look a brief look at v3 and it seems mostly okay at a
glance. There is just one major thing that got left out:

On Wed, Jul 2, 2025 at 3:27 PM Eduard Stefes <Eduard.Stefes@ibm.com> wrote:
> On Wed, 2025-06-11 at 13:48 +0700, John Naylor wrote:
> > As I alluded to before, I'm not in favor of having both direct-call
> > and runtime-check paths here. The reason x86 and Arm have it is
> > because their hardware support works on any length input. Is there
> > any
> > reason to expect auxv.h to be unavailable?
>
> I tried to find a reason but did not find any. So I'll remove it.

v3 still has direct-call and runtime-check paths. Let's keep only
USE_S390X_CRC32C_WITH_RUNTIME_CHECK and discard the direct call
configure checks. Once that's done I'll take a closer look and test as
well. The rest should be small details.

--
John Naylor
Amazon Web Services



Re: [V2] Adding new CRC32C implementation for IBM S390X

От
Eduard Stefes
Дата:
Hi,


On Wed, 2025-07-09 at 13:53 +0700, John Naylor wrote:
> v3 still has direct-call and runtime-check paths. Let's keep only
> USE_S390X_CRC32C_WITH_RUNTIME_CHECK and discard the direct call
> configure checks. Once that's done I'll take a closer look and test
> as
> well. The rest should be small details.

done :) here is V5 with only runtime-check.


--
Eduard Stefes <eddy@linux.ibm.com>
IBM

Вложения

Re: [V2] Adding new CRC32C implementation for IBM S390X

От
John Naylor
Дата:
On Fri, Jul 11, 2025 at 7:01 PM Eduard Stefes <eddy@linux.ibm.com> wrote:
>
> On Wed, 2025-07-09 at 13:53 +0700, John Naylor wrote:
> > v3 still has direct-call and runtime-check paths. Let's keep only
> > USE_S390X_CRC32C_WITH_RUNTIME_CHECK and discard the direct call
> > configure checks. Once that's done I'll take a closer look and test
> > as
> > well. The rest should be small details.
>
> done :) here is V5 with only runtime-check.

Great, thanks! I took this (v4, for the archives) for a spin on one of
the LinuxONE instances in the buildfarm. Since godbolt.org doesn't
have clang for s390x, I tested on a machine with clang 18. With that,
I found that the configure check successfully compiles and links
broken code. :-( This is different from gcc 14 which "harmlessly"
failed to compile (until 14.3 fixed that). So, it seems for the clang
versions that are broken we actually do need to guard within the test
programs after all. Eduard, which compiler versions have you tested
the patch on?

I found the patch passes tests with gcc 13.3 on this machine, then
looked at the configuration outputs.

configure:

```
checking for __attribute__((vector_size(16))), vec_gfmsum_accum_128
with CFLAGS=-fzvector -march=z13... no
checking for __attribute__((vector_size(16))), vec_gfmsum_accum_128
with CFLAGS=-mzarch -march=z13... yes
checking which CRC-32C implementation to use... S390X Vector
instructions for CRC with runtime check
checking for vectorized CRC-32C... none
```

This looks strange. I think we want "CRC-32C implementation" to report
"slicing-by-8" and "vectorized CRC-32C" to report "S390X Vector
instructions".

+# Check for S390X Vector intrinsics to do CRC calculations.
+#
+# First check for the host cpu
+if test x"$host_cpu" = x"s390x" && test x"$ac_cv_func_getauxval" = x"yes"; then

I believe ac_cv_func_getauxval is a leftover from the previous patch.

meson:

```
Checking for function "getauxval" : YES
Checking if "s390x_vector_vx+march=z13" : links: YES
```

This looks fine except for the "getauxval", which is due to the duplicate check:

+elif host_cpu == 's390x' and cc.has_function('getauxval')

This check already happened at the beginning of the file within
"func_checks" and defined the corresponding HAVE_ symbol, which v4
seems to use appropriately in the runtime check.

--
John Naylor
Amazon Web Services