Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
От | Lawrence, Ramon |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Дата | |
Msg-id | 6EEA43D22289484890D119821101B1DF28B35F@exchange20.mercury.ad.ubc.ca обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets ("Bryce Cutt" <pandasuit@gmail.com>) |
Ответы |
Re: Proposed Patch to Improve Performance of
Multi-BatchHash Join for Skewed Data Sets
|
Список | pgsql-hackers |
________________________________ From: pgsql-hackers-owner@postgresql.org on behalf of Robert Haas I think what we need here is some very simple testing to demonstrate that this patch demonstrates a speed-up even when the inner side of the join is a joinrel rather than a baserel. Can you suggest a single query against the skewed TPCH dataset that will result in two or more multi-batch hash joins? If so, it should be a simple matter to run that query with and without the patch and verify that the former is faster than the latter. This query will have the outer relation be a joinrel rather than a baserel: select count(*) from supplier, part, lineitem where l_partkey = p_partkey and s_suppkey = l_suppkey; The approach collects statistics on the outer relation (not the inner relation) so the code had to have the ability to determinea stats tuple on a joinrel in addition to a baserel. Joshua sent us some preliminary data with this query and others and indicated that we could post it. He wanted time to cleanit up and re-run some experiments, but the data is generally good and the algorithm performs as expected. I have attachedthis data to the post. Note that the last set of data (although labelled as Z7) is actually an almost zero skewdatabase and represents the worst-case for the algorithm (for most queries the optimization is not even used). -- Ramon Lawrence
Вложения
В списке pgsql-hackers по дате отправления: