Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

Поиск

Список

Период

Сортировка

От	Lawrence, Ramon
Тема	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Дата	6 ноября 2008 г. 20:31:22
Msg-id	6EEA43D22289484890D119821101B1DF2C16D7@exchange20.mercury.ad.ubc.ca обсуждение исходный текст
Ответ на	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets ("Joshua Tolley" <eggyknap@gmail.com>)
Ответы	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Список	pgsql-hackers

Дерево обсуждения

> -----Original Message-----
> > Minor question on this patch. AFAICS there is another patch that
seems
> > to be aiming at exactly the same use case. Jonah's Bloom filter
patch.
> >
> > Shouldn't we have a dust off to see which one is best? Or at least a
> > discussion to test whether they overlap? Perhaps you already did
that
> > and I missed it because I'm not very tuned in on this thread.
> >
> > --
> >  Simon Riggs           www.2ndQuadrant.com
> >  PostgreSQL Training, Services and Support
>
> We haven't had that discussion AFAIK, and definitely should. First
> glance suggests they could coexist peacefully, with proper coaxing. If
> I understand things properly, Jonah's patch filters tuples early in
> the join process, and this patch tries to ensure that hash join
> batches are kept in RAM when they're most likely to be used. So
> they're orthogonal in purpose, and the patches actually apply *almost*
> cleanly together. Jonah, any comments? If I continue to have some time
> to devote, and get through all I think I can do to review this patch,
> I'll gladly look at Jonah's too, FWIW.
>
> - Josh

The skew patch and bloom filter patch are orthogonal and can both be
applied.  The bloom filter patch is a great idea, and it is used in many
other database systems.  You can use the TPC-H data set to demonstrate
that the bloom filter patch will significantly improve performance of
multi-batch joins (with or without data skew).

Any query that filters a build table before joining on the probe table
will show improvements with a bloom filter.  For example,

select * from customer, orders where customer.c_nationkey = 10 and
customer.c_custkey = orders.o_custkey

The bloom filter on customer would allow us to avoid probing with orders
tuples that cannot possibly find a match due to the selection criteria.
This is especially beneficial for multi-batch joins where an orders
tuple must be written to disk if its corresponding customer batch is not
the in-memory batch.

I have no experience reviewing patches, but I would be happy to help
contribute/review the bloom filter patch as best I can.

--
Dr. Ramon Lawrence
Assistant Professor, Department of Computer Science, University of
British Columbia Okanagan
E-mail: ramon.lawrence@ubc.ca

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets