Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

Поиск
Список
Период
Сортировка
От George Neuner
Тема Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Дата
Msg-id rv8dveln3t9l4st5su4q2p9b8fir2l88bt@4ax.com
обсуждение исходный текст
Ответ на Fast, stable, portable hash function producing 4-byte or 8-byte values?  (Erwin Brandstetter <brsaweda@gmail.com>)
Ответы Re: Fast, stable, portable hash function producing 4-byte or 8-bytevalues?  (Ron <ronljohnsonjr@gmail.com>)
Список pgsql-general
On Tue, 10 Dec 2019 18:00:02 -0600, Ron <ronljohnsonjr@gmail.com>
wrote:

>On 12/10/19 3:11 PM, Erwin Brandstetter wrote:
>> I am looking for stable hash functions producing 8-byte or 4-byte hashes 
>> from long text values in Postgres 10 or later.
>>
>> There is md5(), the result of which can be cast to uuid. This reliably 
>> produces practically unique, stable 16-byte values. I have usecases where 
>> an 8-byte or even 4-byte hash would be good enough to make collisions 
>> reasonably unlikely. (I can recheck on the full string) - and expression 
>> indexes substantially smaller. I could truncate md5 and cast back and 
>> forth, but that seems like a lot of wasted computation. Are there 
>> suggestions for text hash functions that are
>> - fast
>> - keep collisions to a minimum
>> - stable across major Postgres versions (so expression indexes don't break)
>> - croptographic aspect is not needed (acceptable, but no benefit)
>
>What about a CRC32 function?  It's fast, and an SSE4 instruction has been in 
>Intel CPUs for about 10 years.

On long text CRC will not be as discriminating as a real cryptohash,
but it may be considerably faster to compute depending on the length
of the text.  Given that the OP can check the actual string in the
event of a collision, it may work well enough.

One way to make it more discriminating is to compute a pair of CRCs
using different polynomials.  Unfortunately the SSE4.2 CRC32
instruction uses a fixed polynomial.  You can compute it twice using
different initial values, but the result won't be as good as actually
using different polynomials.


I used to create 4-byte hashes by concatenating two 16-bit CRCs - one
using the X-modem polynomial and the other using the CCITT polynomial.
Since these gave very different results, it worked quite well for my
use case where all the strings were guaranteed < 16KB.

YMMV,
George




В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Is there any tool that provides Physical Backup plus PITR for asingle database ( Not the whole PG instance ) ?
Следующее
От: Erik Aronesty
Дата:
Сообщение: Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?