Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
От | Lawrence, Ramon |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets |
Дата | |
Msg-id | 6EEA43D22289484890D119821101B1DF2C16D7@exchange20.mercury.ad.ubc.ca обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets ("Joshua Tolley" <eggyknap@gmail.com>) |
Ответы |
Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
|
Список | pgsql-hackers |
> -----Original Message----- > > Minor question on this patch. AFAICS there is another patch that seems > > to be aiming at exactly the same use case. Jonah's Bloom filter patch. > > > > Shouldn't we have a dust off to see which one is best? Or at least a > > discussion to test whether they overlap? Perhaps you already did that > > and I missed it because I'm not very tuned in on this thread. > > > > -- > > Simon Riggs www.2ndQuadrant.com > > PostgreSQL Training, Services and Support > > We haven't had that discussion AFAIK, and definitely should. First > glance suggests they could coexist peacefully, with proper coaxing. If > I understand things properly, Jonah's patch filters tuples early in > the join process, and this patch tries to ensure that hash join > batches are kept in RAM when they're most likely to be used. So > they're orthogonal in purpose, and the patches actually apply *almost* > cleanly together. Jonah, any comments? If I continue to have some time > to devote, and get through all I think I can do to review this patch, > I'll gladly look at Jonah's too, FWIW. > > - Josh The skew patch and bloom filter patch are orthogonal and can both be applied. The bloom filter patch is a great idea, and it is used in many other database systems. You can use the TPC-H data set to demonstrate that the bloom filter patch will significantly improve performance of multi-batch joins (with or without data skew). Any query that filters a build table before joining on the probe table will show improvements with a bloom filter. For example, select * from customer, orders where customer.c_nationkey = 10 and customer.c_custkey = orders.o_custkey The bloom filter on customer would allow us to avoid probing with orders tuples that cannot possibly find a match due to the selection criteria. This is especially beneficial for multi-batch joins where an orders tuple must be written to disk if its corresponding customer batch is not the in-memory batch. I have no experience reviewing patches, but I would be happy to help contribute/review the bloom filter patch as best I can. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: ramon.lawrence@ubc.ca
В списке pgsql-hackers по дате отправления: