Re: benchmarking the query planner
От | Tom Lane |
---|---|
Тема | Re: benchmarking the query planner |
Дата | |
Msg-id | 5301.1229092509@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: benchmarking the query planner ("Robert Haas" <robertmhaas@gmail.com>) |
Ответы |
Re: benchmarking the query planner
Re: benchmarking the query planner |
Список | pgsql-hackers |
"Robert Haas" <robertmhaas@gmail.com> writes: > On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> The existing sampling mechanism is tied to solid statistics. >> >> Sounds great, but its not true. The sample size is not linked to data >> volume, so how can it possibly give a consistent confidence range? > It is a pretty well-known mathematical fact that for something like an > opinion poll your margin of error does not depend on the size of the > population but only on the size of your sample. Right. The solid math that Greg referred to concerns how big a sample we need in order to have good confidence in the histogram results. It doesn't speak to whether we get good results for ndistinct (or for most-common-values, though in practice that seems to work fairly well). AFAICS, marginal enlargements in the sample size aren't going to help much for ndistinct --- you really need to look at most or all of the table to be guaranteed anything about that. But having said that, I have wondered whether we should consider allowing the sample to grow to fill maintenance_work_mem, rather than making it a predetermined number of rows. One difficulty is that the random-sampling code assumes it has a predetermined rowcount target; I haven't looked at whether that'd be easy to change or whether we'd need a whole new sampling algorithm. regards, tom lane
В списке pgsql-hackers по дате отправления: