Re: correlation in pg_stats
От | pgsql@mohawksoft.com |
---|---|
Тема | Re: correlation in pg_stats |
Дата | |
Msg-id | 16503.24.91.171.78.1107885892.squirrel@mail.mohawksoft.com обсуждение исходный текст |
Ответ на | correlation in pg_stats (Ron Mayer <rm_pg@cheapcomplexdevices.com>) |
Ответы |
Re: correlation in pg_stats
Re: correlation in pg_stats |
Список | pgsql-hackers |
> > Short summary: > > * It looks to me like the planner vastly overestimates > the # of pages read by index scan in quite a few of my > tables even though stats collected by ANALYZE are correct. > > * The problem happens any time you have multiple columns > that have a number of repeated values in them, and > you CLUSTER the table by a sort using both columns > (like "city,state,zip,phone#" or "firstname,lastname"). > > * I think this is the problem that Mark Kirkwood is seeing > in his threads Query optimizer 8.0.1 and "One Big trend > vs multiple smaller trends" in hackers. actually pgsql@mohawksoft.com, is Mark Woodward. Pleased to meet you. :) (I hate using my name on lists like this because of spammers) > > * A test script demonstrating the issue also follows. > > * I think keeping one more stat per attribute in > pg_stastic that could describe this behavior. > > > Longer: > > > If I understand the optimizer correctly, correlation is used > to both guess how much random disk access will be required in > a query; as well as estimate how many pages will be read. > > Unfortunately, many tables in my larger databases have > columns with values that are tightly packed on a few pages; > even though there is no total-ordering across the whole table. > Stephan Szabo described this as a "clumping effect": > http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php Yes. I think we are describing the exact same issue.
В списке pgsql-hackers по дате отправления: