Re: Support run-time partition pruning for hash join

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Support run-time partition pruning for hash join
Дата
Msg-id CAMbWs49+p6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Support run-time partition pruning for hash join  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Support run-time partition pruning for hash join  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers

On Tue, Aug 22, 2023 at 2:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good to see some performance comparisons of the worst case
to see how much overhead the pruning code adds to Hash Join.  It may
well be that we need to consider two Hash Join paths, one with and one
without run-time pruning. It's pretty difficult to meaningfully cost,
as I already mentioned, however.

I performed some performance comparisons of the worst case with two
tables as below:

1. The partitioned table has 1000 children, and 100,000 tuples in total.

2. The other table is designed that
    a) its tuples occupy every partition of the partitioned table so
       that no partitions can be pruned during execution,
    b) tuples belong to the same partition are placed together so that
       we need to scan all its tuples before we could know that no
       pruning would happen and we could stop trying to prune,
    c) the tuples are unique on the hash key so as to minimize the cost
       of hash probe, so that we can highlight the impact of the pruning
       codes.

Here is the execution time (ms) I get with different sizes of the other
table.

tuples  unpatched   patched
10000   45.74       53.46   (+0.17)
20000   54.48       70.18   (+0.29)
30000   62.57       85.18   (+0.36)
40000   69.14       99.19   (+0.43)
50000   76.46       111.09  (+0.45)
60000   82.68       126.37  (+0.53)
70000   92.69       137.89  (+0.49)
80000   94.49       151.46  (+0.60)
90000   101.53      164.93  (+0.62)
100000  107.22      178.44  (+0.66)

So the overhead the pruning code adds to Hash Join is too large to be
accepted :(.  I think we need to solve this problem first before we can
make this new partition pruning mechanism some useful in practice, but
how?  Some thoughts currently in my mind include

1) we try our best to estimate the cost of this partition pruning when
creating hash join paths, and decide based on the cost whether to use it
or not.  But this does not seem to be an easy task.

2) we use some heuristics when executing hash join, such as when we
notice that a $threshold percentage of the partitions must be visited
we just abort the pruning and assume that no partitions can be pruned.

Any thoughts or comments?

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sergey Shinderuk
Дата:
Сообщение: Re: Fix error handling in be_tls_open_server()
Следующее
От: Peter Smith
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node