Re: join ordering via Simulated Annealing
От | bin wang |
---|---|
Тема | Re: join ordering via Simulated Annealing |
Дата | |
Msg-id | 323338680912221757p3028180co317b923e71950f98@mail.gmail.com обсуждение исходный текст |
Ответ на | join ordering via Simulated Annealing (Jan Urbański <wulczer@wulczer.org>) |
Список | pgsql-hackers |
I will follow it.<br />Thank you.<br /><br /><div class="gmail_quote">2009/12/23 Jan Urba��ski <span dir="ltr"><<a href="mailto:wulczer@wulczer.org">wulczer@wulczer.org</a>></span><br/><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi,<br /><br /> I've beenplaying with using a Simulated Annealing-type algorithm for<br /> determinig join ordering for relations. To get intocontext see<br /><a href="http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php" target="_blank">http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php</a><br/> (there's also a TODO in the wiki).There's a nice paper on that in<br /><a href="http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf" target="_blank">http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf</a><br/> (alsolinked from that thread) and someone even wrote a patch:<br /><a href="http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php" target="_blank">http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php</a><br/><br /> This generally aims at beingable to replace GEQO.<br /><br /> I have some rough prototype code, but I'm not even asking people to look<br /> atit. It's stuffed with silly things and writes three Graphviz-style<br /> .dot files in /tmp for each algorithm step. ButI'd like to at least<br /> present the idea.<br /><br /> There are three main problems that have to be solved to get agood<br /> Simulated Annealing implementation:<br /> o) choosing the starting point for the algorithm<br /> o) generatingthe next point<br /> o) computing the cost in the current point<br /><br /> The code I have now creates the initialplan by doing something similar<br /> to what gimme_tree does in GEQO, but without any kind of heuristic to<br />try to favour joins with join restrictions (the idea is that it doesn't<br /> matter, since we only want to get *any* planand we only do it once),<br /> but ideally it would be somehow chosen randonly from the space of all<br /> possible joinorderings.<br /><br /> I keep a binary tree of relations in memory, where leaves are<br /> RelOptInfos with 1 relid andthe root is a RelOptInfo with all relids.<br /> Each iteration of the algorithm picks two nodes at random (with some<br/> restrictions, but that's not important) and tries to swap them around,<br /> meaning that a tree like (use a monospacefont for best results):<br /><br /> (1 2 3 4)<br /> *(1 2) (3 4)<br /> (1) (2) *(3)(4)<br /><br /> where the parenthesised things are the two chosen nodes would get<br /> transformed into<br /><br /> (1 2 3 4)<br /> (3) (1 2 4)<br /> (1 2) (4)<br /> (1) (2)<br /><br/> that is, the (1 2) node and its subtree gets swapped with the (3) node<br /> and the upper-level nodes get changedaccordingly. Sometimes a swap like<br /> that will produce an invalid join ordering - then swap is then reverted.<br/> If the join order given by the tree after the swap is legal, the paths<br /> are recomputed, much like ingeqo_eval, and if the root node's cheapest<br /> path is not cheaper that before the swap, the swap gets reverted.<br />Simulated Annealing algorithms permit uphill moves in terms of cost,<br /> with some probability that's decreasing as timepasses, but that's not<br /> implemented yet. After a fixed amount of moves, the algorithm stops and<br /> returns theroot node of the tree as the single RelOptInfo.<br /><br /> The issues with the approach are:<br /><br /> o) the initialstate is not really a random plan, it's usualy a<br /> left-deep tree (and is very inefficient) and this might skewresults.<br /><br /> o) is swapping random nodes like that a good way of exploring the<br /> solution space? The SApaper suggests something much simpler, but some<br /> of the moves proposed there don't really make sense when using the<br/> make_join_rel machinery:<br /> *) changing the inner and outer relation of a join doesn't make<br /> sense, becausemake_join_rel is symmetrical<br /> *) changing the join executing strategy (nested loop, merge join,<br /> etc.)doesn't make sense, because make_join_rel considers all possible<br /> paths for a join<br /><br /> o) each swap needsa full recalcualtion of the whole solution<br /> (geqo_eval does that, for instance), maybe it's possible to leave the<br/> subtrees of swapped nodes intact and only recalculate the nodes above<br /> the two swapped nodes?<br /><br /> o)because make_join_rel scribbles on the PlannerInfo, the same hack as<br /> in geqo_eval is necessary: truncating join_rel_listafter building all<br /> joinrels and nulling out the join_rel_hash<br /><br /> o) I use make_join_rel to determinewhether a join is legal, which does<br /> lots of other things. That looks like wasted effort, although in the end<br/> each time I need to build the final rel to assess the resulting path's<br /> cost. To follow the SA algorithm spiritmore closely it would be<br /> necessary to only consider one, random path for each relation and make<br /> using differentpaths a move that the algorithm can choose while<br /> exploring the solution space. A cheaper way of determiningthe current<br /> solution's cost would be nice, too - fully rebuilding the final<br /> RelOptInfo after eachmove is annoying.<br /><br /> Lastly, I'm lacking good testcases or even a testing approach: I'm<br /> generating sillyqueries and looking at how they get optimised, but if<br /> someone has a real dataset and examples of joins that cannotbe planned<br /> with the standard planner, I would be interested to compare the results<br /> my prototype gets withthose produced by GEQO.<br /><br /> The code, a module that you can LOAD into a backend, is here (if you<br /> reallywant to see it):<br /><a href="http://git.wulczer.org/?p=saio.git" target="_blank">http://git.wulczer.org/?p=saio.git</a><br/> Set the saio GUC to true and saio_cutoff to the number of iterationsyou<br /> want.<br /><br /> Cheers,<br /> Jan<br /><font color="#888888"><br /> --<br /> Sent via pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To makechanges to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></font></blockquote></div><br />
В списке pgsql-hackers по дате отправления: