Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

Поиск
Список
Период
Сортировка
От Erwin Brandstetter
Тема Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Дата
Msg-id CAGHENJ5qN8fNoBnPSvFg9-EjKteZ30TQe24XLH6Req+7sSL4Qg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?  (Miles Elam <miles.elam@productops.com>)
Список pgsql-general
On Tue, Dec 10, 2019 at 11:34 PM Miles Elam <miles.elam@productops.com> wrote:

In terms of "wasted computation", MD5, SHA1, and the others always compute the full length before they are passed to a UUID, int, or whatever. It's a sunk cost. It's also a minor cost considering many hash algorithms are performed in CPU hardware now. All that's left is the truncation and cast, which you can't avoid easily.


Sure, you could reimplement Java's .hashCode() method by iterating through the characters and processing the character codes:

s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]


I don't see how that would beat the CPU-based hashes though unless you wrote a C-based extension. Maybe it's better just to embrace the user-defined function first and then decide if performance is insufficient for your use cases.


CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5') RETURNS int8 AS $$

SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16), 'hex'))::bit(64)::int8

$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;


CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5') RETURNS int4 AS $$

SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8), 'hex'))::bit(32)::int4

$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;


SELECT

hash4('something something something'),

hash4('something something something', 'sha1'),

hash8('something something something'),

hash8('something something something', 'sha1');



Cheers,


Miles

 


Thanks for the custom functions! May be useful as fallback.

But I am really looking for standard functions in Postgres first. Those should be faster and more reliable than writing my own.


Regards

Erwin

В списке pgsql-general по дате отправления:

Предыдущее
От: Erwin Brandstetter
Дата:
Сообщение: Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Следующее
От: Rich Shepard
Дата:
Сообщение: Encoding/collation question