Re: A problem about partitionwise join

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: A problem about partitionwise join
Дата
Msg-id CAExHW5tmaAj=SHZcURhsiAbQbztf_XMHgAu5_hz-cp-GebJ6qQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A problem about partitionwise join  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: A problem about partitionwise join  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers


On Tue, Mar 19, 2024 at 8:18 AM Richard Guo <guofenglinux@gmail.com> wrote:
(Sorry it takes me some time to get back to this thread.)

On Thu, Mar 7, 2024 at 7:13 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
I did a deeper review of the patch. Here are some comments

Thank you for the review!
 
Approach
--------
The equijoin condition between partition keys doesn't appear in the join's restrictilist because of 'best_score' strategy as you explained well in [2]. What if we add an extra score for clauses between partition keys and give preference to equijoin between partition keys? Have you given it a thought? I feel that having an equijoin clause involving partition keys has more usages compared to a clause with any random column. E.g. nextloop may be able to prune partitions from inner relation if the clause contains a partition key.

Hmm, I think this approach won't work in cases where one certain pair of
partition keys has formed an EC that contains pseudoconstants.  In such
cases, the EC machinery will generate restriction clauses like 'pk =
const' rather than any join clauses.

That should be ok and more desirable. Clauses like pk = const will leave only one partition around in each of the joining relations thus PWJ won't be required OR it will be automatic - whichever way you see it.
 

Besides, it seems to me that it's not a cheap operation to check whether
a join clause is between partition keys when we generate join clauses
from ECs in generate_join_implied_equalities().

Why? The code would be the same as what we have in have_partkey_equi_join().
 
 
Documentation
-------------
The patch does not modify any documentation. The only documentation I could find about partitionwise join is the one for GUC 'enable_partitionwise_join'. It says
--- quote
"Partitionwise join currently applies only when the join conditions include all the partition keys, which must be of the same data type and have one-to-one matching sets of child partitions.".
--- unquote
This sentence is general and IMO covers the case this patch considers. But in general I feel that partitionwise join and aggregation deserve separate sections next to "partition pruning" in [1]; It should mention advanced partition matching algorithm as well. Would you be willing to write one and then expand it for the case in the patch?

I don't think it should be part of this patch to add a new section in
the docs to explain partitionwise join and aggregation.  Maybe that
deserves a separate patch?

Yes.

3. The testcase added looks artificial. it outputs data that has same value for two columns which is equal to the primary key of the other table - when would somebody do that?. Is there some real life example where this change will be useful?

Hmm, I think the test case is good as long as it reveals the issue that
this patch fixes.  It follows the same format as the existing test case
just above it.  I'm not sure if there are real life examples, but I
think it may not always be necessary to derive test cases from them.

Let's defer this to the committer.
 
 
Code
----
Minor comment for now. It will be better to increment num_equal_pks immediately after setting pk_known_equal[ipk] = true. Otherwise the code gets confusing around line 2269. I will spend more time reviewing the code next week.
 
Hmm, the increment of num_equal_pks on line 2272 is parallel to the one
in the first loop (around line 2200).  Maybe it's better to keep them
consistent as the current patch does?


In the first loop, setting pk_known_equal[ipk1] = true and ++num_equal_pks happens on consecutive lines. That's not true in the second loop, where there are at least some code line where num_equal_pks is inconsistent with the number of "true" entries in pk_known_equal. We should avoid that.

--
Best Wishes,
Ashutosh Bapat

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Autogenerate some wait events code and documentation
Следующее
От: Peter Eisentraut
Дата:
Сообщение: What is a typical precision of gettimeofday()?