Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
От | Robert Haas |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets |
Дата | |
Msg-id | 603c8f070812292055t32b850d9l5a9427eea91c2ed5@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets ("Lawrence, Ramon" <ramon.lawrence@ubc.ca>) |
Ответы |
Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
|
Список | pgsql-hackers |
> I think that setting aside a minimum percentage of work_mem may be a > reasonable approach. For instance, setting aside 1% at even 1 MB > work_mem would be 10 KB which is enough to store about 40 MCV tuples of > the TPC-H database. Such a small percentage would be very unlikely (but > still possible) to change the number of batches used. Then, given the > memory allocation and the known tuple size + overhead, only that number > of MCVs are selected for the MCV table regardless how many there are. > The MCV table size would then increase as work_mem is changed up to a > maximum given by the number of MCVs. Sounds fine. Maybe 2-3% would be better. > The code when building the MCV hash table keeps track of the order of > insertion of the best MCVs. It then flushes the MCV partitions in > decreasing order of frequency of MCVs. Thus, by the end of the build > partitioning phase the MCV hash table should only store the most > frequent MCV tuples. Even with many-to-many joins as long as we keep > all build tuples that have a given MCV in memory, then everything is > fine. You would get into problems if you only flushed some of the > tuples of a certain MCV but that will not happen. OK, I'll read it again - I must not have understood. It would be good to post an updated patch soon, even if not everything has been addressed. ...Robert
В списке pgsql-hackers по дате отправления: