Re: Query optimizer 8.0.1 (and 8.0)
От | Ron Mayer |
---|---|
Тема | Re: Query optimizer 8.0.1 (and 8.0) |
Дата | |
Msg-id | 42082EE6.3070401@cheapcomplexdevices.com обсуждение исходный текст |
Ответ на | Re: Query optimizer 8.0.1 (and 8.0) (pgsql@mohawksoft.com) |
Ответы |
Re: Query optimizer 8.0.1 (and 8.0)
|
Список | pgsql-hackers |
pgsql@mohawksoft.com wrote: > > In this case, the behavior observed could be changed by altering the > sample size for a table. I submit that an arbitrary fixed sample size is > not a good base for the analyzer, but that the sample size should be based > on the size of the table or some calculation of its deviation. > Mark, Do you have any evidence that the Sample Size had anything to do with the performance problem you're seeing? I also do a lot with the complete Census/TIGER database. Every problem I have with the optimizer comes down to the fact that the data is loaded (and ordered on disk) by State/County FIPS codes, and then queried by zip-code or by city name. Like this: Alabama 36101 [hundreds of pages with zip's in 36***] Alaska 99686 [hundreds of pages with zip's in 9****] Arizona 85701 [hundreds of pages with zip's in 855**] Note that the zip codes are *NOT* sequential. The "correlation" statistic sees that the Zip codes are not sequential; so it makes the *HORRIBLE* assumption that they are scattered randomly across the disk. In reality, even though there's no total ordering of the zip codes; any given zip code only exists on a couple disk pages; so index scans would be the right choice. But the single correlation parameter is not sufficient to let the optimizer known this. No matter how large a sample size you choose, ANALYZE will correctly see that Zip codes and State FIPS codes are non-correlated, and the optimizer will overestimate the # of pages an index scan will need. Ron PS: I pointed out workarounds in my earlier posting in this thread. Yes, I'm using the same TIGER data you are.
В списке pgsql-hackers по дате отправления: