Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins) |
Дата | |
Msg-id | 199902061751.MAA03085@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins) (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
> >> Why are we maintaining this huge Path List for every RelOptInfo > >> structure if we only need the cheapest? > > I think Bruce is onto something here... keep only the best-so-far > not everything ever generated. That'd get rid of the O(N^2) > comparison behavior. > > The only thing I can think of that we might possibly *need* the > whole list for is if it is used as a recursion stopper. > ("Oh, I already investigated that alternative, no need to go down > that path again.") It did not look to me like the optimizer does > such a thing, but I don't claim to understand the code. > > It seems to me that the search space of possible paths is > well-structured and can be enumerated without generation of duplicates. > You've got a known set of tables involved in a query, a fixed set of > possible access methods for each table, and only so many ways to > combine them. So the real question here is why does the optimizer > even need to check for duplicates --- should it not never generate > any to begin with? The profiles I ran a few days ago show that indeed > most of the generated paths are not duplicates, but a small fraction > are duplicates. Why is that? I'd feel a lot closer to understanding > what's going on if we could explain where the duplicates come from. Here is my guess, and I think it is valid. There are cases where a join of two tables may be more expensive than another plan, but the order of the result may be cheaper to use for a later join than the cheaper plan. That is why I suspect they are doing in better_path(): if (samekeys(path->keys, new_path->keys) && equal_path_ordering(&path->p_ordering, &new_path->p_ordering)) { old_path = path; break; } So I don't think we can grab the cheapest path right away, but I think the system is keeping around non-cheapest paths with the same result ordering. I also would like to throw away plans that are so expensive that sorting another plan to the desired ordering would be cheaper than using the plan. I am only guessing on this. How often does this 'keep non-cheapest path around' really help? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: