Re: Hash support for arrays
От | Ron Mayer |
---|---|
Тема | Re: Hash support for arrays |
Дата | |
Msg-id | 4CD0E390.4080708@cheapcomplexdevices.com обсуждение исходный текст |
Ответ на | Re: Hash support for arrays (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Tom Lane wrote: > It's possible that the multiply-by-31 method is as good as the > rotate-and-xor method by that measure, or even better; but it's far from > obvious that it's better. And I'm not convinced that the multiply > method has a pedigree that should encourage me to take it on faith. Short summary: * some history of it's trivia follows * (nothing here suggests it's better - just old and common and cheap) Longer - some trivia regarding its pedigree: It (or at least a *31 variant) seems to have a history of advocacy going back to Chris Torek in 1990: http://groups.google.com/group/comp.lang.c/browse_thread/thread/28c2095282f0c1b5/193be99e9836791b?q=#193be99e9836791b X#define HASH(str, h, p) \ X for (p = str, h = 0; *p;) h = (h << 5) - h + *p++ and gets referred to in Usenet papers in the early 90's as well: http://www.usenix.com/publications/library/proceedings/sa92/salz.pdf Regarding "why the magic number 31 [or 33 which also often comes up]" apparently the only thing magic about it is that it's an odd number != 1. The rest of the odd numbers work about as well according to this guy who tried to explain it: http://svn.eu.apache.org/repos/asf/apr/apr/trunk/tables/apr_hash.c * The magic of number 33, i.e. why it works betterthan many other * constants, prime or not, has never been adequately explained by * anyone. So I try an explanation:if one experimentally tests all * multipliers between 1 and 256 (as I did while writing a low-level * datastructure library some time ago) one detects that even * numbers are not useable at all. The remaining 128 odd numbers * (except for the number 1) work more or less all equally well. * They all distribute in an acceptable way andthis way fill a hash * table with an average percent of approx. 86%. * * If one compares the chi^2 values ofthe variants (see * Bob Jenkins ``Hashing Frequently Asked Questions'' at * http://burtleburtle.net/bob/hash/hashfaq.htmlfor a description * of chi^2), the number 33 not even has the best value.But the * number 33 and a few other equally good numbers like 17, 31, 63, * 127 and 129 have nevertheless a greatadvantage to the remaining * numbers in the large set of possible multipliers: their multiply * operation canbe replaced by a faster operation based on just one * shift plus either a single addition or subtraction operation.And * because a hash function has to both distribute good _and_ has to * be very fast to compute, those fewnumbers should be preferred. ...
В списке pgsql-hackers по дате отправления: