Re: Really bad blowups with hash outer join and nulls

Поиск

Список

Период

Сортировка

От	Andrew Gierth
Тема	Re: Really bad blowups with hash outer join and nulls
Дата	15 февраля 2015 г. 23:50:46
Msg-id	87egpqlqnh.fsf@news-spur.riddles.org.uk обсуждение исходный текст
Ответ на	Re: Really bad blowups with hash outer join and nulls (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Really bad blowups with hash outer join and nulls
Список	pgsql-hackers

Дерево обсуждения

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> A quick test suggests that initializing the hash value to ~0 rather
 >> than 0 has a curious effect: the number of batches still explodes,
 >> but the performance does not suffer the same way. (I think because
 >> almost all the batches end up empty.) I think this is worth doing
 >> even in the absence of a more general solution; nulls are common
 >> enough and important enough that they shouldn't be the worst-case
 >> value if it can be avoided.

 Tom> I think that's unlikely to be a path to a good solution.

It wasn't really intended to be.

 Tom> At least part of the problem here is that
 Tom> estimate_hash_bucketsize() supposes that nulls can be ignored ---
 Tom> which is normally true, and invalidates your claim that they're
 Tom> common.  But in a RIGHT JOIN situation, they need to be considered
 Tom> as if they were regular keys.  That would probably be sufficient
 Tom> to dissuade the planner from choosing a hash join in this example.

I've now tried the attached patch to correct the bucketsize estimates,
and it does indeed stop the planner from considering the offending path
(in this case it just does the join the other way round).

One thing I couldn't immediately see how to do was account for the case
where there are a lot of nulls in the table but a strict qual (or an IS
NOT NULL) filters them out; this patch will be overly pessimistic about
such cases. Do estimates normally try and take things like this into
account? I didn't find any other relevant examples.

 Tom> There may also be something we can do in the executor, but it
 Tom> would take closer analysis to figure out what's going wrong.  I
 Tom> don't think kluging the behavior for NULL in particular is the
 Tom> answer.

The point with nulls is that a hash value of 0 is currently special in
two distinct ways: it's always in batch 0 and bucket 0 regardless of how
many batches and buckets there are, and it's the result of hashing a
null.  These two special cases interact in a worst-case manner, so it
seems worthwhile to avoid that.

--
Andrew (irc:RhodiumToad)

Вложения

hjfix2.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Really bad blowups with hash outer join and nulls

Вложения