Re: Change GUC hashtable to use simplehash?
От | John Naylor |
---|---|
Тема | Re: Change GUC hashtable to use simplehash? |
Дата | |
Msg-id | CANWCAZYDNjTh=8FxgZKmS8HEE=wcmLMoJx+Vvg3+xgAaGY=2oQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Change GUC hashtable to use simplehash? (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Change GUC hashtable to use simplehash?
|
Список | pgsql-hackers |
On Thu, Nov 23, 2023 at 5:34 AM Andres Freund <andres@anarazel.de> wrote: > It's worth noting that the limited range of the input values means that > there's a lot of bias toward some bits being set ('a' to 'z' all start with > 0b011). We can take advantage of the limited range with a single additional instruction: After "ch |= 0x20", do "ch -= ('a' - 1)". That'll shrink letters and underscores to the range [1,31], which fits in 5 bits. (Other characters are much less common in a guc name). That increases randomness and allows 12 chars to be xor'd in before the first bits rotate around. > If, which I mildly doubt, we can't afford to call murmurhash32 for every > character, we could just call it for 32/5 input characters together. Or we > could just load up to 8 characters into an 64bit integer, can call > murmurhash64. I'll play around with this idea, as well. > The fact that string_hash() is slow due to the strlen(), which causes us to > process the input twice and which is optimized to also handle very long > strings which typically string_hash() doesn't encounter, seems problematic far > beyond this case. We use string_hash() in a *lot* of places, and that strlen() > does regularly show up in profiles. We should fix that. +1 > I think we ought to adjust our APIs around this: > 1) The accumulator state of the hash functions should be exposed, so one can > accumulate values into the hash state, without reducing the internal state > to a single 32/64 bit variable. If so, it might make sense to vendor a small, suitably licensed hash function that already has these APIs. While on the subject, it'd be good to have a clear separation between in-memory and on-disk usage, so we can make breaking changes in the former.
В списке pgsql-hackers по дате отправления: