Re: Cross-column statistics revisited
От | Josh Berkus |
---|---|
Тема | Re: Cross-column statistics revisited |
Дата | |
Msg-id | 200810161512.18869.josh@agliodbs.com обсуждение исходный текст |
Ответ на | Re: Cross-column statistics revisited ("Robert Haas" <robertmhaas@gmail.com>) |
Ответы |
Re: Cross-column statistics revisited
Re: Cross-column statistics revisited |
Список | pgsql-hackers |
> Yes, or to phrase that another way: What kinds of queries are being > poorly optimized now and why? Well, we have two different correlation problems. One is the problem of dependant correlation, such as the 1.0 correlation of ZIP and CITY fields as a common problem. This could in fact be fixed, I believe, via a linear math calculation based on the sampled level of correlation, assuming we have enough samples. And it's really only an issue if the correlation is > 0.5. The second type of correlation issue we have is correlating values in a parent table with *rows* in child table (i.e. FK joins). Currently, the planner assumes that all rows in the child table are evenly distributed against keys in the parent table. But many real-world databases have this kind of problem: A B 1 10000 rows 2 10000 rows 3 1000 rows 4 .. 1000 0 to 1 rows For queries which cover values between 4..1000 on A, the misestimate won't be much of a real execution problem. But for values 1,2,3, the query will bomb. > The other half of this is that bad selectivity estimates only matter > if they're bad enough to change the plan, and I'm not sure whether > cases like this are actually a problem in practice. My experience is that any estimate which is more than 5x wrong (i.e. < .2 or > 5.0) usually causes problems, and 3x sometimes causes problems. -- --Josh Josh Berkus PostgreSQL San Francisco
В списке pgsql-hackers по дате отправления: