Re: Change GUC hashtable to use simplehash?
От | Ants Aasma |
---|---|
Тема | Re: Change GUC hashtable to use simplehash? |
Дата | |
Msg-id | CANwKhkO364C3moZk_m+MEy+ryTB8ehh-Sh-EqLg3Uc94y2P3ow@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Change GUC hashtable to use simplehash? (John Naylor <johncnaylorls@gmail.com>) |
Ответы |
Re: Change GUC hashtable to use simplehash?
|
Список | pgsql-hackers |
On Tue, 30 Jan 2024 at 12:04, John Naylor <johncnaylorls@gmail.com> wrote: > > On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma <ants.aasma@cybertec.at> wrote: > > But given that we know the data length and we have it in a register > > already, it's easy enough to just mask out data past the end with a > > shift. See patch 1. Performance benefit is about 1.5x Measured on a > > small test harness that just hashes and finalizes an array of strings, > > with a data dependency between consecutive hashes (next address > > depends on the previous hash output). > > Interesting work! I've taken this idea and (I'm guessing, haven't > tested) improved it by re-using an intermediate step for the > conditional, simplifying the creation of the mask, and moving the > bitscan out of the longest dependency chain. Since you didn't attach > the test harness, would you like to run this and see how it fares? > (v16-0001 is same as your 0001, and v16-0002 builds upon it.) I plan > to test myself as well, but since your test tries to model true > latency, I'm more interested in that one. It didn't calculate the same result because the if (mask) condition was incorrect. Changed it to if (chunk & 0xFF) and removed the right shift from the mask. It seems to be half a nanosecond faster, but as I don't have a machine set up for microbenchmarking it's quite close to measurement noise. I didn't post the harness as it's currently so messy to be near useless to others. But if you'd like to play around, I can tidy it up a bit and post it. > > Not sure if the second one is worth the extra code. > > I'd say it's not worth optimizing the case we think won't be taken > anyway. I also like having a simple path to assert against. Agreed. As an addendum, I couldn't resist trying out using 256bit vectors with two parallel AES hashes running, unaligned loads with special casing page boundary straddling loads. Requires -march=x86-64-v3 -maes. About 20% faster than fasthash on short strings, 2.2x faster on 4k strings. Right now requires 4 bytes alignment (uses vpmaskmovd), but could be made to work with any alignment. Regards, Ants Aasma
Вложения
В списке pgsql-hackers по дате отправления: