Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
От | Tomas Vondra |
---|---|
Тема | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Дата | |
Msg-id | b337cd3c-7091-2342-8e66-36919d91f70c@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop ("Todd A. Cook" <tcook@blackducksoftware.com>) |
Ответы |
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
|
Список | pgsql-bugs |
On 01/25/2018 11:31 PM, Todd A. Cook wrote: > On 11/27/17 14:17, Tomas Vondra wrote: >> Hi, >> >> On 11/27/2017 07:57 PM, tcook@blackducksoftware.com wrote: >>> The following bug has been logged on the website: >>> >>> Bug reference: 14932 >>> Logged by: Todd Cook >>> Email address: tcook@blackducksoftware.com >>> PostgreSQL version: 10.1 >>> Operating system: CentOS Linux release 7.4.1708 (Core) >>> Description: >>> >>> It hangs on a table with 167834 rows, though it works fine with only >>> 167833 >>> rows. When it hangs, CTRL-C does not interrupt it, and the backend >>> has to >>> be killed to stop it. >>> >> >> Can you share the query and data, so that we can reproduce the issue? >> >> Based on the stack traces this smells like a bug in the simplehash, >> introduced in PostgreSQL 10. Perhaps somewhere in tuplehash_grow(), >> which gets triggered for 167834 rows (but not for 167833). > > FWIW, changing the guts of hashint8() to > > + if (val >= INT32_MIN && val <= INT32_MAX) > + return hash_uint32((uint32) val); > + else > + return hash_any((unsigned char *) &val, sizeof(val)); > > allows us to process a full-sized data set of around 900 million rows. > However, > memory usage seemed to be rather excessive (we can only run 7 of these > jobs in parallel > on a 128GB system before the OOM killer kicked in, rather than the usual > 24); if there's > any interest, I can try to measure exactly how excessive. > I suspect you're right the hash is biased to lohalf bits, as you wrote in the 19/12 message. In fact, I think it's a direct consequence of the requirement that hashint8() needs to produce the same hash for logically equivalent int2 and int4 values. Out of curiosity, could you try replacing the hash_any call in hashint8 with a hash function like murmur3, and see if it improves the behavior? That obviously breaks the hashint8 for cross-type hash joins, but it would be interesting bit of information I think. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: