Re: disfavoring unparameterized nested loops

Поиск
Список
Период
Сортировка
От Benjamin Coutu
Тема Re: disfavoring unparameterized nested loops
Дата
Msg-id 0d3f9971e82d2d969a5f@zeyos.com
обсуждение исходный текст
Ответ на disfavoring unparameterized nested loops  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: disfavoring unparameterized nested loops  (Peter Geoghegan <pg@bowt.ie>)
Re: disfavoring unparameterized nested loops  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
I'd like to revamp this important discussion.

As is well described in this fairly recent paper here https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks
atPostgres) "estimation errors quickly grow as the number of joins increases, and that these errors are usually the
reasonfor bad plans" - I think we can all get behind that statement. 

While nested loop joins work great when cardinality estimates are correct, they are notoriously bad when the optimizer
underestimatesand they degrade very fast in such cases - the good vs. bad here is very asymmetric. On the other hand
hashjoins degrade much more gracefully - they are considered very robust against underestimation. The above mentioned
paperillustrates that all mayor DBMS (including Postgres) tend to underestimate and usually that underestimation
increasesdrastically with the number of joins (see Figures 3+4 of the paper). 

Now, a simple approach to guarding against bad plans that arise from underestimation could be to use what I would call
anested-loop-conviction-multiplier based on the current depth of the join tree, e.g. for a base table that multiplier
wouldobviously be 1, but would then grow (e.g.) quadratically. That conviction-multiplier would *NOT* be used to skew
thecardinality estimates themselves, but rather be applied to the overall nested loop join cost at each particular
stageof the plan when comparing it to other more robust join strategies like hash or sort-merge joins. That way when we
canbe sure to have a good estimate at the bottom of the join tree we treat all things equal, but favor nested loops
lessand less as we move up the join tree for the sake of robustness. 
Also, we can expand the multiplier whenever we fall back to using the default cardinality constant as surely all bets
areoff at that point - we should definitely treat nested loop joins as out of favor in this instance and that could
easilybe incorporated by simply increasing the conviction-mutliplier. 

What are your thoughts on this simple idea - is it perhaps too simple?

Cheers, Ben



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ronan Dunklau
Дата:
Сообщение: Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: kerberos/001_auth test fails on arm CPU darwin