Re: Does "correlation" mislead the optimizer on large tables?
От | Tom Lane |
---|---|
Тема | Re: Does "correlation" mislead the optimizer on large tables? |
Дата | |
Msg-id | 11025.1043390899@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Does "correlation" mislead the optimizer on large tables? (Ron Mayer <ron@intervideo.com>) |
Ответы |
Re: Does "correlation" mislead the optimizer on large
Re: Does "correlation" mislead the optimizer on large |
Список | pgsql-performance |
Ron Mayer <ron@intervideo.com> writes: > On a large tables, I think the "correlation" pg_stats field as calculated > by "vacuum analyze" or "analyze" can mislead the optimizer. If you look in the pghackers archives, you will find some discussion about changing the equation that cost_index() uses to estimate the impact of correlation on indexscan cost. The existing equation is ad-hoc and surely wrong, but so far no one's proposed a replacement that can be justified any better. If you've got such a replacement then we're all ears... > In particular, if I have a large table t with columns 'a','b','c', etc, > and I cluster the table as follows: > create table t_ordered as select * from t order by a,b; > vacuum analyze t_ordered; > Column "b" will (correctly) get a very low "correlation" in > the pg_stats table -- but I think the optimizer would do better > assuming a high correlation because similar 'b' values are still > grouped closely on the same disk pages. How would that be? They'll be separated by the stride of 'a'. It seems likely to me that a one-dimensional correlation statistic may be inadequate, but I haven't seen any proposals for better stats. regards, tom lane
В списке pgsql-performance по дате отправления: