Re: Improve CRC32C performance on SSE4.2
От | John Naylor |
---|---|
Тема | Re: Improve CRC32C performance on SSE4.2 |
Дата | |
Msg-id | CANWCAZY1Le1tpTZauY-JzbLpk=VSerP8=GZs36Cza9iJfRnn-A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improve CRC32C performance on SSE4.2 (John Naylor <johncnaylorls@gmail.com>) |
Список | pgsql-hackers |
On Mon, Mar 24, 2025 at 6:37 PM John Naylor <johncnaylorls@gmail.com> wrote: > > I'll take a look at the configure > checks soon, since I had some questions there. One other thing I forgot to mention: The previous test function had local constants that the compiler was able to fold, resulting in no actual vector instructions being emitted: movabs rdx, 12884901891 xor eax, eax crc32 rax, rdx crc32 rax, rdx ret That may be okay for practical purposes, but in the spirit of commit fdb5dd6331e30 I changed it in v15 to use global variables and made sure it emits what the function attributes are intended for: vmovdqu64 zmm3, ZMMWORD PTR x[rip] xor eax, eax vpclmulqdq zmm0, zmm3, ZMMWORD PTR y[rip], 0 vextracti32x4 xmm2, zmm0, 1 vmovdqa64 xmm1, xmm0 vmovdqu64 ZMMWORD PTR y[rip], zmm0 vextracti32x4 xmm0, zmm0, 2 vpternlogq xmm1, xmm2, xmm0, 150 vmovq rdx, xmm1 crc32 rax, rdx vzeroupper ret -- John Naylor Amazon Web Services
В списке pgsql-hackers по дате отправления: