Re: Incorrect estimates on correlated filters
От | Decibel! |
---|---|
Тема | Re: Incorrect estimates on correlated filters |
Дата | |
Msg-id | 12707C41-48C3-4072-A152-46FAB1FEE1E2@decibel.org обсуждение исходный текст |
Ответ на | Incorrect estimates on correlated filters ("Chris Kratz" <chris.kratz@vistashare.com>) |
Ответы |
Re: Incorrect estimates on correlated filters
Re: Incorrect estimates on correlated filters |
Список | pgsql-performance |
On Aug 12, 2008, at 4:59 PM, Chris Kratz wrote: > Ran into a re-occuring performance problem with some report queries > again today. In a nutshell, we have filters on either multiple > joined tables, or multiple columns on a single table that are > highly correlated. So, the estimates come out grossly incorrect > (the planner has no way to know they are correlated). 2000:1 for > one I'm looking at right now. Generally this doesn't matter, > except in complex reporting queries like these when this is the > first join of 40 other joins. Because the estimate is wrong at the > lowest level, it snowballs up through the rest of the joins causing > the query to run very, very slowly. In many of these cases, > forcing nested loops off for the duration of the query fixes the > problem. But I have a couple that still are painfully slow and > shouldn't be. > > I've been reading through the archives with others having similar > problems (including myself a year ago). Am I right in assuming > that at this point there is still little we can do in postgres to > speed up this kind of query? Right now the planner has no way to > know the correlation between different columns in the same table, > let alone columns in different tables. So, it just assumes no > correlation and returns incorrectly low estimates in cases like these. > > The only solution I've come up with so far is to materialize > portions of the larger query into subqueries with these correlated > filters which are indexed and analyzed before joining into the > larger query. This would keep the incorrect estimates from > snowballing up through the chain of joins. > > Are there any other solutions to this problem? Well... you could try and convince certain members of the community that we actually do need some kind of a query hint mechanism... ;) I did make a suggestion a few months ago that involved sorting a table on different columns and recording the correlation of other columns. The scheme isn't perfect, but it would help detect cases like a field populated by a sequence and another field that's insert timestamp; those two fields would correlate highly, and you should even be able to correlate the two histograms; that would allow you to infer that most of the insert times for _id's between 100 and 200 will be between 2008-01-01 00:10 and 2008-01-01 00:20, for example. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Вложения
В списке pgsql-performance по дате отправления: