Re: The science of optimization in practical terms?
От | Robert Haas |
---|---|
Тема | Re: The science of optimization in practical terms? |
Дата | |
Msg-id | 603c8f070902131320n47904f8cr2f95a5f01e496e85@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: The science of optimization in practical terms? ("Joshua D. Drake" <jd@commandprompt.com>) |
Ответы |
Re: The science of optimization in practical terms?
|
Список | pgsql-hackers |
On Fri, Feb 13, 2009 at 3:27 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > On Fri, 2009-02-13 at 20:10 +0000, Grzegorz Jaskiewicz wrote: >> yet more arguments, to let postgresql estimate those automatically. > > Well I haven't seen any arguments actually. Which was the point of my > original question. I don't think anyone actually knows what these knobs > change, in practice. Well, in broad strokes, it seems to me that what they do is fairly obvious: they affect the planner's willingness to choose plans that touch more pages vs. plans that involve more CPU overhead (e.g. qual evaluation). If the database is mostly or entirely in shared buffers or the system buffer cache, and CPU consumption is a problem, then raising the CPU costs is apt to help. I think the root of this problem is that we can't model caching effects. random_page_cost > seq_page_cost models the cost of seeks, but min(random_page_cost, seq_page_cost) >> max(cpu_tuple_cost, cpu_index_tuple_cost, cpu_operator_cost) models the fact that read from disk, even sequentially, is always slow. Unfortunately, if the whole database is likely already in memory, which seems to be a pretty common scenario even for relatively large databases (because people buy more memory to make them fit), then it's just wrong. If we had a good knowledge of which pages were apt to be cached, we could add a GUC cached_page_cost with a default value of maybe 0.2, and presumably we'd get better plans that way. The bad news is that it's pretty difficult to get that knowledge (and of course it could change after the fact if the usage pattern of the database shifts dramatically). The good news is that experimentation is possible. For example, we could: - Assume that small relations are more likely to be cached (so derate page costs when accessing them). - Allow users to override the page cost on a per-rel basis using a reloption. - Gather statistics on relation access patterns and use that to estimate the fraction of a relation likely to be in cache. If your whole database stays in memory all the time, I would guess that you could either raise the CPU costs or drop the page costs quite substantially and that would probably work out fine. What's tougher is to still be able to generate good plans when only part of the database fits in memory, or there's other activity on the system that is periodically purging portions of the system cache. ...Robert
В списке pgsql-hackers по дате отправления: