Re: Correlation in cost_index()
От | Tom Lane |
---|---|
Тема | Re: Correlation in cost_index() |
Дата | |
Msg-id | 1418.1060381541@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Correlation in cost_index() (Sean Chittenden <sean@chittenden.org>) |
Ответы |
Re: Correlation in cost_index()
|
Список | pgsql-hackers |
Sean Chittenden <sean@chittenden.org> writes: > indexCorrelation is 1.0 for the 1st key in a multi-column index. ... only if it's perfectly correlated. > As things stand, however, if a multi-column key is > used, the indexCorrelation is penalized by the size of the number of > keys found in the multi-column index. As things stand the qual > user_id = 42, on a CLUSTER'ed multi-column index (user_id,utc_date) > has an indexCorrelation of 0.5, when in fact the correlation is 1.0. Right, in the perfectly-correlated case this calculation is clearly wrong. However, what of cases where the first column shows good correlation with the physical ordering, but the second does not? The nasty part of this is that the correlation stat that ANALYZE computed for the second column is of no value to us. Two examples: X Y X Y A A A BA B A CA C A AB A B AB B B CB C B BC A C CC B C AC C C B In both cases ANALYZE will calculate correlation 1.0 for column X, and something near zero for column Y. We would like to come out with index correlation 1.0 for the left-hand case and something much less (but, perhaps, not zero) for the right-hand case. I don't really see a way to do this without actually examining the multi-column ordering relationship during ANALYZE. > I tossed a different index on my test table to see how well things > fare with a low correlation, and this was a bit disturbing: Seems like most of the error in that estimate has to do with the poor rowcount estimation. There's very little percentage in trying to analyze the effect of index correlation in examples where we don't have the first-order stats correct ... regards, tom lane
В списке pgsql-hackers по дате отправления: