Re: Change GUC hashtable to use simplehash?

Поиск

Список

Период

Сортировка

От	John Naylor
Тема	Re: Change GUC hashtable to use simplehash?
Дата	29 ноября 2023 г. 13:31:21
Msg-id	CANWCAZb_17xdPP7v1_mXXg5TVbWkDzkP_WDgbCWpqUvz_VZ+OQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Change GUC hashtable to use simplehash? (John Naylor <johncnaylorls@gmail.com>)
Ответы	Re: Change GUC hashtable to use simplehash? Re: Change GUC hashtable to use simplehash? Re: Change GUC hashtable to use simplehash? Re: Change GUC hashtable to use simplehash?
Список	pgsql-hackers

Дерево обсуждения

Attached is a rough start with Andres's earlier ideas, to get
something concrete out there.

I took a look around at other implementations a bit. Many modern hash
functions use MUM-style hashing, which typically uses 128-bit
arithmetic. Even if they already have an incremental interface and
have a compatible license, it seems a bit too much work to adopt just
for a couple string use cases. Might be useful elsewhere, though, but
that's off topic.

However, I did find a couple hash functions that are much simpler to
adapt to a bytewise interface, pass SMHasher, and are decently fast on
short inputs:

- fast-hash, MIT licensed, and apparently has some use in software [1]
- MX3, CC0 license (looking around, seems controversial for a code
license, so didn't go further). [2] Seems to be a for-fun project, but
the accompanying articles are very informative on how to develop these
things.

After wacking fast-hash around, it doesn't really resemble the
original much, and if for some reason we went as far as switching out
the mixing/final functions, it may as well be called completely
original work. I thought it best to start with something whose mixing
behavior passes SMHasher, and hopefully preserve that property.

Note that the combining and final steps share most of their arithmetic
operations. This may have been done on purpose to minimize binary
size, but I didn't check. Also, it incorporates input length into the
calculation. Since we don't know the length of C strings up front, I
threw that out for now. It'd be possible to track the length as we go
and incorporate something into the final step. The hard part is
verifying it hasn't lost any quality.

v5-0001 puts fash-hash as-is into a new header, named in a way to
convey in-memory use e.g. hash tables.

v5-0002 does the minimal to allow dynash to use this for string_hash,
inlined but still calling strlen.

v5-0003 shows one way to do a incremental interface. It might be okay
for simplehash with fixed length keys, but seems awkward for strings.

v5-0004 shows a bytewise incremental interface, with implementations
for dynahash (getting rid of strlen) and guc hash.

[1] https://code.google.com/archive/p/fast-hash/
[2] https://github.com/jonmaiga/mx3

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Change GUC hashtable to use simplehash?

Вложения