Re: Correlation in cost_index()
От | Zeugswetter Andreas SB SD |
---|---|
Тема | Re: Correlation in cost_index() |
Дата | |
Msg-id | 46C15C39FEB2C44BA555E356FBCD6FA4961FEC@m0114.s-mxs.net обсуждение исходный текст |
Ответ на | Correlation in cost_index() (Manfred Koizar <mkoi-pg@aon.at>) |
Список | pgsql-hackers |
> In both cases ANALYZE will calculate correlation 1.0 for column X, > and something near zero for column Y. We would like to come out with > index correlation 1.0 for the left-hand case and something much less > (but, perhaps, not zero) for the right-hand case. I don't really see > a way to do this without actually examining the multi-column ordering > relationship during ANALYZE. The only way the second column correlation will be irrelevant is if the first column is already (nearly) unique (enough so, that the second column wont scatter fetches enough to fill the buffer before seeing cache hits). Thus I think when merging correlations you could take nunique into account. corr = corr_1 * (corr_2 * ( 1 - nunique_1 / nrows)) But, I think one (new) correlation metric for the whole index (whole key) and the data pages would actually be sufficient. This metric could imho always be used instead of the per column correlations to calculate index cost. This holds true as long as you walk an index range, and that is what it is all about, no ? ??? Andreas
В списке pgsql-hackers по дате отправления: