Re: Hash index todo list item
От | Brian Hurt |
---|---|
Тема | Re: Hash index todo list item |
Дата | |
Msg-id | 46E1695D.5000909@janestcapital.com обсуждение исходный текст |
Ответ на | Re: Hash index todo list item (Kenneth Marshall <ktm@rice.edu>) |
Ответы |
Re: Hash index todo list item
|
Список | pgsql-hackers |
Kenneth Marshall wrote:<br /><blockquote cite="mid20070907145507.GJ19403@it.is.rice.edu" type="cite"><blockquote type="cite"><blockquotetype="cite"><pre wrap=""> </pre></blockquote><pre wrap="">How likely is it that you will geta hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you get into birthday attack problems. With a 32-bit hash, the likelyhood is greater than 50% that two strings in a collection of 100,000 will hash to the same value. With a 64-bit hash, the likelyhood is greater than 50% that two strings in a collection of 10 billion will has to same value. 10 billion is a large number, but not an unreasonable number, of strings to want to put into a hash table- and it's exactly this case where the O(1) cost of hashtables starts being a real win. Brian </pre></blockquote><pre wrap="">Yes, there is a non-negligible chance of collision (In a DB is there any chance that is non-negligible? :) ) and the values must be checked against the actual. The win is the collapse of the index size and only needed to check a small fraction of the actual tuples. </pre></blockquote><br /> Ah, OK- I misunderstood you. I thought you were saying that the hash values would need to beunique, and you wouldn't check the original values at all. My bad.<br /><br /> Brian<br /><br />
В списке pgsql-hackers по дате отправления: