Re: Bunch o' dead code in GEQO
От | Tom Lane |
---|---|
Тема | Re: Bunch o' dead code in GEQO |
Дата | |
Msg-id | 12735.1074790315@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Bunch o' dead code in GEQO ("scott.marlowe" <scott.marlowe@ihs.com>) |
Список | pgsql-hackers |
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On Thu, 22 Jan 2004, Tom Lane wrote: >> I'm assuming that the original author of the GEQO code already did that >> testing ... > Hmmm. I was figuring he wasn't sure so he left them in for other people > to test. Is this a part of the code that eats up much time, or something > simple and fast that isn't part of the "GEQO takes 8 seconds to plan" > problem? Well, the basic plan of the GEQO code is Step 1: generate a bunch of possible join paths at random. Step 2: randomly select a pair of paths from the current population, generate a new path that is some combination of these, and push it back into the population, dropping the worst path from the population. Repeat for a bunch of generations. Step 3: take the best path in the final population. The different recombination algorithms simply are different ways of generating a "child" path given two "parent" paths in step 2. Changing them wouldn't affect the runtime noticeably at all --- the primary cost is in evaluating each generated path, which is why the runtime is approximately the sum of the population size (step 1) and the number of generations (step 2). Possibly a different recombiner would give a better chance of finding a good plan, but I'm unconvinced. Arguably the recombiners that are there are all wrong anyway, since they were all invented to solve Traveling Salesman problems, which this is not quite. The only way we can do much about the runtime is to reduce the default population size. With the current default parameters, a large query will have population size 1024 and 400 generations, so about two-thirds of the runtime is in generating the initial random population; if we can't make a dent in that then we're not going to gain much. The question is what will this do to the average quality of the selected plans. One thing I'm thinking about is trying to improve the quality of the initial population by immediately discarding any really bad plans (for instance, use a heuristic that pays attention to which relations are linked by WHERE clauses). regards, tom lane
В списке pgsql-hackers по дате отправления: