Improving worst-case merge join performance with often-null foreign key
От | Steinar Kaldager |
---|---|
Тема | Improving worst-case merge join performance with often-null foreign key |
Дата | |
Msg-id | CANcDffdhmnt+6mB0G+=oF_1tGbr+2RkRokjjORxG67BYjiKTkw@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Improving worst-case merge join performance with often-null foreign key
|
Список | pgsql-hackers |
Hi, First-time potential contributor here. We recently had an incident due to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s) due to a join with a foreign key that was often null. We found that it was caused by a merge join with an index scan on one join path -- whenever the non-null data happened to be such that the merge join couldn't be terminated early, the index would proceed to scan all of the null rows and filter each one out individually. Since this was an inner join, this was pointless; the nulls would never have matched the join clause anyway. Test script to reproduce + example explain output: https://paste.depesz.com/s/VUj Once you're aware of it for a given index, this is a solvable issue. We solved it by adding a partial index. Including an IS NOT NULL clause in the query also seems to solve it. However, I've gotten the notion that it should be possible to solve this on the Postgres side, with no user action required. When doing an inner-join merge join with a normal equality operator, we should already "know" that we don't have to look at rows where the join keys are null, since any such comparison involving NULL will not return true. If we can just push that knowledge down to the index scan node, then we should be able to avoid this entire problem, leaving one less performance trap to trip people up. Proof-of-concept patch (please ignore style): https://github.com/steinarvk-oda/postgres/pull/1/files I have a few questions: (1) Does this approach seem reasonable and worthwhile? (2) Can I determine programmatically at query planning time whether the binary operator in an OpExpr has the property that all comparisons involving nulls will be non-true? Or, failing that, can I at least somehow hardcode and identify the built-in equality operators (which have this property)? (Integer equality and UUID equality probably covers a lot when it comes to joins.) (3) Is there a systematic way to check whether adding an "IS NOT NULL" condition would be redundant? E.g. if there is such a check in the query already, adding another is just messy. (4) Should this sort of thing be made conditional on a high null_frac? Or is that unnecessary complexity, and the simplicity of just always doing it would be better? Thanks! Steinar Kaldager
В списке pgsql-hackers по дате отправления: