Re: Change GUC hashtable to use simplehash?

Поиск
Список
Период
Сортировка
От Ants Aasma
Тема Re: Change GUC hashtable to use simplehash?
Дата
Msg-id CANwKhkO364C3moZk_m+MEy+ryTB8ehh-Sh-EqLg3Uc94y2P3ow@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Change GUC hashtable to use simplehash?  (John Naylor <johncnaylorls@gmail.com>)
Ответы Re: Change GUC hashtable to use simplehash?  (John Naylor <johncnaylorls@gmail.com>)
Список pgsql-hackers
On Tue, 30 Jan 2024 at 12:04, John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma <ants.aasma@cybertec.at> wrote:
> > But given that we know the data length and we have it in a register
> > already, it's easy enough to just mask out data past the end with a
> > shift. See patch 1. Performance benefit is about 1.5x Measured on a
> > small test harness that just hashes and finalizes an array of strings,
> > with a data dependency between consecutive hashes (next address
> > depends on the previous hash output).
>
> Interesting work! I've taken this idea and (I'm guessing, haven't
> tested) improved it by re-using an intermediate step for the
> conditional, simplifying the creation of the mask, and moving the
> bitscan out of the longest dependency chain. Since you didn't attach
> the test harness, would you like to run this and see how it fares?
> (v16-0001 is same as your 0001, and v16-0002 builds upon it.) I plan
> to test myself as well, but since your test tries to model true
> latency, I'm more interested in that one.

It didn't calculate the same result because the if (mask) condition
was incorrect. Changed it to if (chunk & 0xFF) and removed the right
shift from the mask. It seems to be half a nanosecond faster, but as I
don't have a machine set up for microbenchmarking it's quite close to
measurement noise.

I didn't post the harness as it's currently so messy to be near
useless to others. But if you'd like to play around,  I can tidy it up
a bit and post it.

> > Not sure if the second one is worth the extra code.
>
> I'd say it's not worth optimizing the case we think won't be taken
> anyway. I also like having a simple path to assert against.

Agreed.

As an addendum, I couldn't resist trying out using 256bit vectors with
two parallel AES hashes running, unaligned loads with special casing
page boundary straddling loads. Requires -march=x86-64-v3 -maes. About
20% faster than fasthash on short strings, 2.2x faster on 4k strings.
Right now requires 4 bytes alignment (uses vpmaskmovd), but could be
made to work with any alignment.

Regards,
Ants Aasma

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Bytea PL/Perl transform
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby