Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
От | Thomas Munro |
---|---|
Тема | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Дата | |
Msg-id | CAEepm=10VqZm7JF2YSC9CV2uYzVvVeVUB2yGtXYf5TZGmCsSmw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On Tue, Nov 28, 2017 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > You beat me to it --- after looking at simplehash.h I'd guessed that > either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing > an infinite loop, but I'd not gotten to determining which one yet. > > I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither > of them are obviously loop-proof. > > Note that the sample data has a lot of collisions: > > regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc; > hashint8 | count > -------------+------- > 441526644 | 2337 > -1117776826 | 1221 > -1202007016 | 935 > -2068831050 | 620 > 1156644653 | 538 > 553783815 | 510 > 259780770 | 444 > 371047036 | 394 > 915722575 | 359 > ... etc etc ... > > It's evidently more complicated than just that the code fails with > more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not > by a lot. There needs to be a safety valve that prevents letting > the hash fill factor approach zero, which is what's happening in > this test case. Yeah. If you count distinct hashint8(val) of *distinct* values, you get: postgres=# select hashint8(val), count(*) from (select distinct val from reproducer) ss group by 1 order by 2 desc limit 1; hashint8 | count ------------+-------1288181237 | 26 (1 row) It doesn't matter how many bits of that you use, you'll get at least 26 collisions, so our loop can't terminate just by increasing the mask size. My guess is that you'd either need a defence based on something like a secondary hash function, or at least a dynamic SH_GROW_MAX_DIB that wouldn't allow the fill factor to plummet as you say. A dataset could be found that would exceed any static value of SH_GROW_MAX_DIB by brute force. In this case, considering the definition of hashint8, there are more straightforward ways to find distinct bigint values that hash to the same value... just swap some bits between the high and low words, or something like that. -- Thomas Munro http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: