Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

Поиск

Список

Период

Сортировка

От	ktm@rice.edu
Тема	Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Дата	7 октября 2013 г. 12:49:19
Msg-id	20131007125023.GQ16128@aart.rice.edu обсуждение исходный текст
Ответ на	Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption (Tomas Vondra <tv@fuzzy.cz>)
Ответы	Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Список	pgsql-hackers

Дерево обсуждения

On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote:
> > 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
> >    For fun, try not hashing those ints at all and see how that performs (that,
> >    I think, is what you get from HashSet<int> in Java/C#).
> 
> I've used crc32 mostly because it's easily available in the code (i.e.
> I'm lazy), but I've done some quick research / primitive benchmarking
> too. For example hashing 2e9 integers takes this much time:
> 
> FNV-1   = 11.9
> FNV-1a  = 11.9
> jenkins = 38.8
> crc32   = 32.0
> 
> So it's not really "slow" and the uniformity seems to be rather good.
> 
> I'll try FNV in the future, however I don't think that's the main issue
> right now.
> 
Hi Tomas,

If you are going to use a function that is not currently in the code,
please consider xxhash:

http://code.google.com/p/xxhash/

Here are some benchmarks for some of the faster hash functions:

Name            Speed       Q.Score   Author
xxHash          5.4 GB/s     10
MumurHash 3a    2.7 GB/s     10       Austin Appleby
SpookyHash      2.0 GB/s     10       Bob Jenkins
SBox            1.4 GB/s      9       Bret Mulvey
Lookup3         1.2 GB/s      9       Bob Jenkins
CityHash64      1.05 GB/s    10       Pike & Alakuijala
FNV             0.55 GB/s     5       Fowler, Noll, Vo
CRC32           0.43 GB/s     9
MD5-32          0.33 GB/s    10       Ronald L. Rivest
SHA1-32         0.28 GB/s    10

Regards,
Ken

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption