Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
От | Bryce Cutt |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Дата | |
Msg-id | 1924d1180812292129v2d973b05rf319f3dc4049a5be@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets ("Robert Haas" <robertmhaas@gmail.com>) |
Ответы |
Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
|
Список | pgsql-hackers |
Here is the next patch version. The naming and style concerns have been addressed. The patch now only touches 5 files. 4 of those files are hashjoin specific and 1 is to add a couple lines to a hashjoin specific struct in another file. The code can now find the the MCVs in more cases. Even if the probe side is an operator other than a seq scan (such as another hashjoin) the code can now find the stats tuple for the underlying relation. The new idea of limiting the number of MCVs to a percentage of memory has not been added yet. - Bryce Cutt On Mon, Dec 29, 2008 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think that setting aside a minimum percentage of work_mem may be a >> reasonable approach. For instance, setting aside 1% at even 1 MB >> work_mem would be 10 KB which is enough to store about 40 MCV tuples of >> the TPC-H database. Such a small percentage would be very unlikely (but >> still possible) to change the number of batches used. Then, given the >> memory allocation and the known tuple size + overhead, only that number >> of MCVs are selected for the MCV table regardless how many there are. >> The MCV table size would then increase as work_mem is changed up to a >> maximum given by the number of MCVs. > > Sounds fine. Maybe 2-3% would be better. > >> The code when building the MCV hash table keeps track of the order of >> insertion of the best MCVs. It then flushes the MCV partitions in >> decreasing order of frequency of MCVs. Thus, by the end of the build >> partitioning phase the MCV hash table should only store the most >> frequent MCV tuples. Even with many-to-many joins as long as we keep >> all build tuples that have a given MCV in memory, then everything is >> fine. You would get into problems if you only flushed some of the >> tuples of a certain MCV but that will not happen. > > OK, I'll read it again - I must not have understood. > > It would be good to post an updated patch soon, even if not everything > has been addressed. > > ...Robert >
Вложения
В списке pgsql-hackers по дате отправления: