Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
От | Andres Freund |
---|---|
Тема | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Дата | |
Msg-id | 20180129193219.fk5vnliembhacuys@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
Hi, On 2018-01-29 13:45:10 -0500, Tom Lane wrote: > Also, the fact that you needed one more ORDER BY in testing on a > single machine is already proof of my worry about the regression > tests, and it'll only get worse when testing across a variety > of machines. That's an entirely unconvincing argument. A lot of changes to hashtable code will result in ordering changes, and has done so in the past. Sure there occasionally was need for a followup ORDER BY addition, but that's not that bad. The fact that only a single order by was needed when entirely changing the hash function output (by randomizing it) doesn't at all seem to suggest that it's a huge problem. On 2018-01-29 14:11:26 -0500, Tom Lane wrote: > One other point here is that it's not really clear to me what a randomly > varying IV is supposed to accomplish. Surely we're not intending that > it prevents somebody from crafting a data set that causes bad hash > performance. I do think we should make that harder. > If a user with DB access wants to cause a performance > problem, there are and always will be plenty of other avenues to making > that happen. Sure, but that's different from somebody triggering, e.g. by website input, such behaviour. > If the idea is that for a data set that otherwise would have > bad hash performance, choosing a different IV would (almost always) fix > it, that sounds good but you're ignoring the inverse case: for a data set > that works fine, there would be some choices of IV that create a problem > where there was none before. I see no reason to think that the > probability of the former kind of situation is higher than the latter. It's easily intentionally triggerable vs. not. > So I'm on board with using the extended hash functions when available, > but I'm not convinced that a varying IV buys us anything but trouble. I think we really need both. A constant IV of e.g. 0 for a single-column hashtable doesn't really fix much, even though it does improve hash quality a bit for multi-column indexes. A random IV does address issues due to a lot of tuples falling into the same bucket, but does not address actual collisions. Both together are pretty convincing tho. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: