Improving worst-case merge join performance with often-null foreign key

Поиск

Список

Период

Сортировка

От	Steinar Kaldager
Тема	Improving worst-case merge join performance with often-null foreign key
Дата	21 апреля 2023 г. 16:13:33
Msg-id	CANcDffdhmnt+6mB0G+=oF_1tGbr+2RkRokjjORxG67BYjiKTkw@mail.gmail.com обсуждение исходный текст
Ответы	Re: Improving worst-case merge join performance with often-null foreign key
Список	pgsql-hackers

Дерево обсуждения

Hi,

First-time potential contributor here. We recently had an incident due
to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
due to a join with a foreign key that was often null. We found that it
was caused by a merge join with an index scan on one join path --
whenever the non-null data happened to be such that the merge join
couldn't be terminated early, the index would proceed to scan all of
the null rows and filter each one out individually. Since this was an
inner join, this was pointless; the nulls would never have matched the
join clause anyway.

Test script to reproduce + example explain output:
https://paste.depesz.com/s/VUj

Once you're aware of it for a given index, this is a solvable issue.
We solved it by adding a partial index. Including an IS NOT NULL
clause in the query also seems to solve it.

However, I've gotten the notion that it should be possible to solve
this on the Postgres side, with no user action required. When doing an
inner-join merge join with a normal equality operator, we should
already "know" that we don't have to look at rows where the join keys
are null, since any such comparison involving NULL will not return
true. If we can just push that knowledge down to the index scan node,
then we should be able to avoid this entire problem, leaving one less
performance trap to trip people up.

Proof-of-concept patch (please ignore style):
https://github.com/steinarvk-oda/postgres/pull/1/files

I have a few questions:

(1) Does this approach seem reasonable and worthwhile?
(2) Can I determine programmatically at query planning time whether
the binary operator in an OpExpr has the property that all comparisons
involving nulls will be non-true? Or, failing that, can I at least
somehow hardcode and identify the built-in equality operators (which
have this property)? (Integer equality and UUID equality probably
covers a lot when it comes to joins.)
(3) Is there a systematic way to check whether adding an "IS NOT NULL"
condition would be redundant? E.g. if there is such a check in the
query already, adding another is just messy.
(4) Should this sort of thing be made conditional on a high null_frac?
Or is that unnecessary complexity, and the simplicity of just always
doing it would be better?

Thanks!
Steinar Kaldager

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Improving worst-case merge join performance with often-null foreign key