Re: Columns correlation and adaptive query optimization
От | Konstantin Knizhnik |
---|---|
Тема | Re: Columns correlation and adaptive query optimization |
Дата | |
Msg-id | e0474c11-cafe-1535-813e-15dae6b48365@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Columns correlation and adaptive query optimization (Rafia Sabih <rafia.pghackers@gmail.com>) |
Список | pgsql-hackers |
Thank you very much for review. On 25.03.2020 20:04, Rafia Sabih wrote: > > +static void > +AddMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es); > + > > This doesn't look like the right place for it, you might want to > declare it with other functions in the starting of the file. > > Also, there is no description about any of the functions here, > wouldn’t hurt having some more comments there. Sorry, I will fix it. Actually this patch contains of two independent parts: first allows to use auto_explain extension to generate mutlicolumn statistic for variables used in clauses for which selectivity estimation gives wrong result. It affects only auto_explain extension. Second part allows to use multicolumn statistic for join selectivity estimation. As far as I know extended statistic is now actively improved: https://www.postgresql.org/message-id/flat/20200309000157.ig5tcrynvaqu4ixd%40development#bfbdf9c41c31ef92819dfc5ecde4a67c I think that using extended statistic for join selectivity is very important and should also be addressed. If my approach is on so good, I will be pleased for other suggestions. > > A few of more questions that cross my mind at this point, > > - have you tried measuring the extra cost we have to pay for this > mores statistics , and also compare it with the benefit it gives in > terms of accuracy. Adding statistic not always leads to performance improvement but I never observed any performance degradation caused by presence of extended statistic. Definitely we can manually create too many extended statistic entries for different subsets of columns. And it certainly increase planning time because optimizer has to consider more alternatives. But in practice I never noticed such slowdown. > - I would also be interested in understanding if there are cases when > adding this extra step doesn’t help and have you excluded them already > or if some of them are easily identifiable at this stage...? Unfortunately there are many cases when extended statistic can not help. Either because optimizer is not able to use it (for example my patch consider only cases with strict equality comparison, but if you use predicate like "a.x=b.x and a.y in (1,2,3)" then extended statistic for <x,y> can not be used. Either because collected statistic itself is not precise enough , especially in case of data skews. > - is there any limit on the number of columns for which this will > work, or should there be any such limit...? > Right now there is limit for maximal number of columns used in extended statistic: 8 columns. But in practice I rarely see join predicates involving more than 3 columns. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
В списке pgsql-hackers по дате отправления: