Re: A query planner that learns
От | AgentM |
---|---|
Тема | Re: A query planner that learns |
Дата | |
Msg-id | 1B7FAD85-20E9-40AE-8018-724F94E132F8@themactionfaction.com обсуждение исходный текст |
Ответ на | Re: A query planner that learns ("John D. Burger" <john@mitre.org>) |
Ответы |
Re: A query planner that learns
|
Список | pgsql-general |
On Oct 13, 2006, at 11:47 , John D. Burger wrote: > Erik Jones wrote: > >> Forgive me if I'm way off here as I'm not all that familiar with >> the internals of postgres, but isn't this what the genetic query >> optimizer discussed the one of the manual's appendixes is supposed >> to do. > > No - it's not an "optimizer" in that sense. When there are a small > enough set of tables involved, the planner uses a dynamic > programming algorithm to explore the entire space of all possible > plans. But the space grows exponentially (I think) with the number > of tables - when this would take too long, the planner switches to > a genetic algorithm approach, which explores a small fraction of > the plan space, in a guided manner. > > But with both approaches, the planner is just using the static > statistics gathered by ANALYZE to estimate the cost of each > candidate plan, and these statistics are based on sampling your > data - they may be wrong, or at least misleading. (In particular, > the statistic for total number of unique values is frequently =way= > off, per a recent thread here. I have been reading about this, > idly thinking about how to improve the estimate.) > > The idea of a learning planner, I suppose, would be one that > examines cases where these statistics lead to very misguided > expectations. The simplest version of a "learning" planner could > simply bump up the statistics targets on certain columns. A > slightly more sophisticated idea would be for some of the > statistics to optionally use parametric modeling (this column is a > Gaussian, let's estimate the mean and variance, this one is a Beta > distribution ...). Then the smarter planner could spend some > cycles applying more sophisticated statistical modeling to > problematic tables/columns. One simple first step would be to run an ANALYZE whenever a sequential scan is executed. Is there a reason not to do this? It could be controlled by a GUC variable in case someone wants repeatable plans. Further down the line, statistics could be collected during the execution of any query- updating histograms on delete and update, as well. -M
В списке pgsql-general по дате отправления: