Re: Really bad blowups with hash outer join and nulls
От | Andrew Gierth |
---|---|
Тема | Re: Really bad blowups with hash outer join and nulls |
Дата | |
Msg-id | 87egpqlqnh.fsf@news-spur.riddles.org.uk обсуждение исходный текст |
Ответ на | Re: Really bad blowups with hash outer join and nulls (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Really bad blowups with hash outer join and nulls
|
Список | pgsql-hackers |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> A quick test suggests that initializing the hash value to ~0 rather >> than 0 has a curious effect: the number of batches still explodes, >> but the performance does not suffer the same way. (I think because >> almost all the batches end up empty.) I think this is worth doing >> even in the absence of a more general solution; nulls are common >> enough and important enough that they shouldn't be the worst-case >> value if it can be avoided. Tom> I think that's unlikely to be a path to a good solution. It wasn't really intended to be. Tom> At least part of the problem here is that Tom> estimate_hash_bucketsize() supposes that nulls can be ignored --- Tom> which is normally true, and invalidates your claim that they're Tom> common. But in a RIGHT JOIN situation, they need to be considered Tom> as if they were regular keys. That would probably be sufficient Tom> to dissuade the planner from choosing a hash join in this example. I've now tried the attached patch to correct the bucketsize estimates, and it does indeed stop the planner from considering the offending path (in this case it just does the join the other way round). One thing I couldn't immediately see how to do was account for the case where there are a lot of nulls in the table but a strict qual (or an IS NOT NULL) filters them out; this patch will be overly pessimistic about such cases. Do estimates normally try and take things like this into account? I didn't find any other relevant examples. Tom> There may also be something we can do in the executor, but it Tom> would take closer analysis to figure out what's going wrong. I Tom> don't think kluging the behavior for NULL in particular is the Tom> answer. The point with nulls is that a hash value of 0 is currently special in two distinct ways: it's always in batch 0 and bucket 0 regardless of how many batches and buckets there are, and it's the result of hashing a null. These two special cases interact in a worst-case manner, so it seems worthwhile to avoid that. -- Andrew (irc:RhodiumToad)
Вложения
В списке pgsql-hackers по дате отправления: