Re: Query optimizer 8.0.1 (and 8.0)
От | Mark Kirkwood |
---|---|
Тема | Re: Query optimizer 8.0.1 (and 8.0) |
Дата | |
Msg-id | 4207DC01.9060102@coretech.co.nz обсуждение исходный текст |
Ответ на | Re: Query optimizer 8.0.1 (and 8.0) (pgsql@mohawksoft.com) |
Ответы |
Re: Query optimizer 8.0.1 (and 8.0)
|
Список | pgsql-hackers |
Maybe I am missing something - ISTM that you can increase your statistics target for those larger tables to obtain a larger (i.e. better) sample. regards Mark pgsql@mohawksoft.com wrote: >>pgsql@mohawksoft.com writes: > Any and all random sampling assumes a degree of uniform distribution. This > is the basis of the model. It assumes that chunks of the whole will be > representative of the whole (to some degree). This works when normal > variations are more or less distributed uniformly. As variations and > trends becomes less uniformly distributed, more samples are required to > characterize it. > > Douglas Adams had a great device called the "Total Perspective Vortex" > which infered the whole of the universe from a piece of fairy cake. It was > a subtle play on the absurd notion that a very small sample could lead to > an understanding of an infinitly larger whole. > > On a very basic level, why bother sampling the whole table at all? Why not > check one block and infer all information from that? Because we know that > isn't enough data. In a table of 4.6 million rows, can you say with any > mathmatical certainty that a sample of 100 points can be, in any way, > representative? > > Another problem with random sampling is trend analysis. Often times there > are minor trends in data. Ron pointed out the lastname firstname trend. > Although there seems to be no correlation between firstnames in the table, > there are clearly groups or clusters of ordered data that is an ordering > that is missed by too small a sample. > > I understand why you chose the Vitter algorithm, because it provides a > basically sound methodology for sampling without knowledge of the size of > the whole, but I think we can do better. I would suggest using the current > algorithm the first time through, then adjust the number of samples [n] > based on the previous estimate of the size of the table [N]. Each > successive ANALYZE will become more accurate. The Vitter algorithm is > still useful as [N] will always be an estimate. >
В списке pgsql-hackers по дате отправления: