Re: Multi-pass planner
От | Claudio Freire |
---|---|
Тема | Re: Multi-pass planner |
Дата | |
Msg-id | CAGTBQpYyaHPeQzZnkseRxcn5Xm4zRuvB7bes8xJBgsQq-EPYpg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Multi-pass planner (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: Multi-pass planner
|
Список | pgsql-hackers |
On Fri, Apr 19, 2013 at 6:19 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Apr 3, 2013 at 6:40 PM, Greg Stark <stark@mit.edu> wrote: >> >> >> On Fri, Aug 21, 2009 at 6:54 PM, decibel <decibel@decibel.org> wrote: >>> >>> Would it? Risk seems like it would just be something along the lines of >>> the high-end of our estimate. I don't think confidence should be that hard >>> either. IE: hard-coded guesses have a low confidence. Something pulled right >>> out of most_common_vals has a high confidence. > > > I wouldn't be so sure of that. I've run into cases where all of the > frequencies pulled out of most_common_vals are off by orders of magnitude. > The problem is that if ANALYZE only samples 1/1000th of the table, and it > sees a value twice, it assumes the value is present 2000 times in the table, > even when it was only in the table twice. Now, for any given value that > occurs twice in the table, it is very unlikely for both of those to end up > in the sample. But when you have millions of distinct values which each > occur twice (or some low number of time), it is a near certainty that > several of them are going to end with both instances in the sample. Those > few ones that get "lucky" are of course going to end up in the > most_common_vals list. Especially if there's some locality of occurrence, since analyze samples pages, not rows.
В списке pgsql-hackers по дате отправления: