Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
От | Tomas Vondra |
---|---|
Тема | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Дата | |
Msg-id | 43825a04-d36c-2632-91fd-332e0fa2bd72@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On 01/27/2018 12:22 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I suspect you're right the hash is biased to lohalf bits, as you wrote >> in the 19/12 message. > > I don't see any bias in what it's doing, which is basically xoring the > two halves and hashing the result. It's possible though that Todd's > data set contains values in which corresponding bits of the high and > low halves are correlated somehow, in which case the xor would produce > a lot of cancellation and a relatively small number of distinct outputs. > Hmm, that makes more sense than what I wrote. Probably time to get some sleep or drink more coffee, I guess. BTW what do you think about the fact that we only really generate ~63% of the possible hash values (see my message from 11/12)? That seems a bit unfortunate, although not unexpected for simple hash hunction. > If we weren't bound by backwards compatibility, we could consider > changing to logic more like "if the value is within the int4 range, > apply int4hash, otherwise hash all 8 bytes normally". But I don't see > how we can change that now that hash indexes are first-class > citizens. > Yeah, I've been thinking about that too. But I think it's an issue only for pg_upgraded clusters, which may have have hash indexes (and also hash-partitioned tables). So couldn't we use new hash functions in fresh clusters and use the backwards-compatible ones in pg_upgraded ones?r regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: