Re: Does "correlation" mislead the optimizer on large
От | Stephan Szabo |
---|---|
Тема | Re: Does "correlation" mislead the optimizer on large |
Дата | |
Msg-id | 20030124081405.W30842-100000@megazone23.bigpanda.com обсуждение исходный текст |
Ответ на | Re: Does "correlation" mislead the optimizer on large tables? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Does "correlation" mislead the optimizer on large
|
Список | pgsql-performance |
On Fri, 24 Jan 2003, Tom Lane wrote: > Ron Mayer <ron@intervideo.com> writes: > > In particular, if I have a large table t with columns 'a','b','c', etc, > > and I cluster the table as follows: > > create table t_ordered as select * from t order by a,b; > > vacuum analyze t_ordered; > > Column "b" will (correctly) get a very low "correlation" in > > the pg_stats table -- but I think the optimizer would do better > > assuming a high correlation because similar 'b' values are still > > grouped closely on the same disk pages. > > How would that be? They'll be separated by the stride of 'a'. I think it's a clumping effect. For example, I made a table (ordered) with 20 values of a, 50 values of b (each showing up in each a) and 100 values of c (not used, just means 100 rows for each (a,b) combination. It's got 541 pages it looks like. Analyze sets the correlation to about 0.08 on the table and so a query like: select * from test1 where b=1; prefers a sequence scan (1791 vs 2231) while the index scan actually performs about 5 times better. I guess the reason is that in general, the index scan *really* is reading something on the order of 40 pages rather than the much larger estimate (I'd guess something on the order of say 300-400? I'm not sure how to find that except by trying to reverse engineer the estimate number), because pretty much each value of a will probably have 1 or 2 pages with b=1. I'm not really sure how to measure that, however.
В списке pgsql-performance по дате отправления: