Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
От | Huchev |
---|---|
Тема | Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption |
Дата | |
Msg-id | 1381491764000-5774264.post@n5.nabble.com обсуждение исходный текст |
Ответ на | Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly
high memory consumption
|
Список | pgsql-hackers |
gettimeofday(&start, NULL); for (i = 0; i < VALUES; i++) { state = XXH32_init(result); XXH32_update(state,&i, 4); XXH32_digest(state); } gettimeofday(&end, NULL); This code is using the "update" variant, which is only useful when dealing with very large amount of data which can't fit into a single block of memory. This is obviously overkill for a 4-bytes-only test. 3 functions calls, a malloc, intermediate data book keeping, etc. To hash a single block of data, it's better to use the simpler (and faster) variant XXH32() : gettimeofday(&start, NULL); for (i = 0; i < VALUES; i++) { XXH32(&i, 4, result); } gettimeofday(&end, NULL); You'll probably get better results by an order of magnitude. For better results, you could even inline it (yes, for such short loop with almost no work to do, it makes a very sensible difference). That being said, it's true that these advanced hash algorithms only shine with "big enough" amount of data to hash. Hashing a 4-bytes value into a 4-bytes hash is a bit limited exercise. There is no "pigeon hole" issue. A simple multiplication by a 32-bits prime would fare good enough and result in zero collision. -- View this message in context: http://postgresql.1045698.n5.nabble.com/custom-hash-based-COUNT-DISTINCT-aggregate-unexpectedly-high-memory-consumption-tp5773463p5774264.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
В списке pgsql-hackers по дате отправления: