Re: Change GUC hashtable to use simplehash?
От | John Naylor |
---|---|
Тема | Re: Change GUC hashtable to use simplehash? |
Дата | |
Msg-id | CANWCAZb_17xdPP7v1_mXXg5TVbWkDzkP_WDgbCWpqUvz_VZ+OQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Change GUC hashtable to use simplehash? (John Naylor <johncnaylorls@gmail.com>) |
Ответы |
Re: Change GUC hashtable to use simplehash?
Re: Change GUC hashtable to use simplehash? Re: Change GUC hashtable to use simplehash? Re: Change GUC hashtable to use simplehash? |
Список | pgsql-hackers |
Attached is a rough start with Andres's earlier ideas, to get something concrete out there. I took a look around at other implementations a bit. Many modern hash functions use MUM-style hashing, which typically uses 128-bit arithmetic. Even if they already have an incremental interface and have a compatible license, it seems a bit too much work to adopt just for a couple string use cases. Might be useful elsewhere, though, but that's off topic. However, I did find a couple hash functions that are much simpler to adapt to a bytewise interface, pass SMHasher, and are decently fast on short inputs: - fast-hash, MIT licensed, and apparently has some use in software [1] - MX3, CC0 license (looking around, seems controversial for a code license, so didn't go further). [2] Seems to be a for-fun project, but the accompanying articles are very informative on how to develop these things. After wacking fast-hash around, it doesn't really resemble the original much, and if for some reason we went as far as switching out the mixing/final functions, it may as well be called completely original work. I thought it best to start with something whose mixing behavior passes SMHasher, and hopefully preserve that property. Note that the combining and final steps share most of their arithmetic operations. This may have been done on purpose to minimize binary size, but I didn't check. Also, it incorporates input length into the calculation. Since we don't know the length of C strings up front, I threw that out for now. It'd be possible to track the length as we go and incorporate something into the final step. The hard part is verifying it hasn't lost any quality. v5-0001 puts fash-hash as-is into a new header, named in a way to convey in-memory use e.g. hash tables. v5-0002 does the minimal to allow dynash to use this for string_hash, inlined but still calling strlen. v5-0003 shows one way to do a incremental interface. It might be okay for simplehash with fixed length keys, but seems awkward for strings. v5-0004 shows a bytewise incremental interface, with implementations for dynahash (getting rid of strlen) and guc hash. [1] https://code.google.com/archive/p/fast-hash/ [2] https://github.com/jonmaiga/mx3
Вложения
В списке pgsql-hackers по дате отправления: