Fast, stable, portable hash function producing 4-byte or 8-byte values?

Поиск
Список
Период
Сортировка
От Erwin Brandstetter
Тема Fast, stable, portable hash function producing 4-byte or 8-byte values?
Дата
Msg-id CAGHENJ5MnkKeymEVFO6daga6qzTruG91u9yeuwkBuTZ_xRSPRg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Fast, stable, portable hash function producing 4-byte or 8-bytevalues?  (Laurenz Albe <laurenz.albe@cybertec.at>)
Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?  (Miles Elam <miles.elam@productops.com>)
Re: Fast, stable, portable hash function producing 4-byte or 8-bytevalues?  (Ron <ronljohnsonjr@gmail.com>)
Список pgsql-general
I am looking for stable hash functions producing 8-byte or 4-byte hashes from long text values in Postgres 10 or later.

There is md5(), the result of which can be cast to uuid. This reliably produces practically unique, stable 16-byte values. I have usecases where an 8-byte or even 4-byte hash would be good enough to make collisions reasonably unlikely. (I can recheck on the full string) - and expression indexes substantially smaller. I could truncate md5 and cast back and forth, but that seems like a lot of wasted computation. Are there suggestions for text hash functions that are
- fast
- keep collisions to a minimum
- stable across major Postgres versions (so expression indexes don't break)
- croptographic aspect is not needed (acceptable, but no benefit)

There is an old post from 2012 by Tom Lane suggesting that hashtext() and friends are not for users:


Postgres 11 added hashtextextended() and friends to generate bigint hashes. In a more recent post from 3 months ago, Tom suggests to use it in user-land - if portability is not needed:

https://www.postgresql.org/message-id/9434.1568839177%40sss.pgh.pa.us

Is pghashlib by Marko Kreen my best option?


Or the "version-independent hash functions for PostgreSQL" from Peter Eisentraut:


Neither received updates for a couple of years. Unmaintained? Or obsolete?
And neither is available on most hosted services like RDS or Heroku (which would be required in come cases).

So what are my best options?

Regards
Erwin

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: A question about upgrading on Debian/Ubuntu
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: Fast, stable, portable hash function producing 4-byte or 8-bytevalues?