Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
От | Robert Haas |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Дата | |
Msg-id | 603c8f070903201735x351271c4y4ce298b422367a25@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Proposed Patch to Improve Performance of
Multi-BatchHash Join for Skewed Data Sets
|
Список | pgsql-hackers |
On Fri, Mar 20, 2009 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bryce Cutt <pandasuit@gmail.com> writes: >> Here is the new patch. > > Applied with revisions. I undid some of the "optimizations" that > cluttered the code in order to save a cycle or two per tuple --- as per > previous discussion, that's not what the performance questions were > about. Also, I did not like the terminology "in-memory"/"IM"; it seemed > confusing since the main hash table is in-memory too. I revised the > code to consistently refer to the additional hash table as a "skew" > hashtable and the optimization in general as skew optimization. Hope > that seems reasonable to you --- we could search-and-replace it to > something else if you'd prefer. > > For the moment, I didn't really do anything about teaching the planner > to account for this optimization in its cost estimates. The initial > estimate of the number of MCVs that will be specially treated seems to > me to be too high (it's only accurate if the inner relation is unique), > but getting a more accurate estimate seems pretty hard, and it's not > clear it's worth the trouble. Without that, though, you can't tell > what fraction of outer tuples will get the short-circuit treatment. If the inner relation isn't fairly close to unique you shouldn't be using this optimization in the first place. ...Robert
В списке pgsql-hackers по дате отправления: