Re: A problem about partitionwise join

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: A problem about partitionwise join
Дата
Msg-id CAExHW5v=5igWC8BzsxmMJeMnr98pmZZBB3zF5XxHN5BEeGNe-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A problem about partitionwise join  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: A problem about partitionwise join  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers


On Thu, Feb 22, 2024 at 2:56 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Feb 21, 2024 at 4:55 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Tue, Aug 2, 2022 at 4:24 AM Jacob Champion <jchampion@timescale.com> wrote:
>>
>> Once you think you've built up some community support and the patchset
>> is ready for review, you (or any interested party) can resurrect the
>> patch entry by visiting
>>
>>     https://commitfest.postgresql.org/38/2266/
>>
>> and changing the status to "Needs Review", and then changing the
>> status again to "Move to next CF". (Don't forget the second step;
>> hopefully we will have streamlined this in the near future!)
>
>
> This patch was returned due to 'lack of interest'.  However, upon
> verification, it appears that the reported issue still exists, and the
> proposed fix in the thread remains valid.  Hence, resurrect this patch
> after rebasing it on master.  I've also written a detailed commit
> message which hopefully can help people review the changes more
> effectively.


I did a deeper review of the patch. Here are some comments

Approach
--------
The equijoin condition between partition keys doesn't appear in the join's restrictilist because of 'best_score' strategy as you explained well in [2]. What if we add an extra score for clauses between partition keys and give preference to equijoin between partition keys? Have you given it a thought? I feel that having an equijoin clause involving partition keys has more usages compared to a clause with any random column. E.g. nextloop may be able to prune partitions from inner relation if the clause contains a partition key.

Partition pruning requires equality clauses on partition keys as well. create_append_plan() fetches those from best_path->param_info. If we created and saved the clauses involving partition keys somewhere separately, similar to the clauses involving index keys, it might help this case as well as the partition pruning code. Have you considered this idea?

There was a proposal to use ECs for outer joins as well and then use only ECs to decide whether equijoins between partition keys exist. I don't think the proposal has materialized. So we have to continue looking at restrictlist as well. I don't see a point waiting for it, but others might feel differently.

I am just trying to find ways to avoid two loops in have_partkey_equi_join(). If the alternatives are worse, I think the current approach is fine.

Documentation
-------------
The patch does not modify any documentation. The only documentation I could find about partitionwise join is the one for GUC 'enable_partitionwise_join'. It says
--- quote
"Partitionwise join currently applies only when the join conditions include all the partition keys, which must be of the same data type and have one-to-one matching sets of child partitions.".
--- unquote
This sentence is general and IMO covers the case this patch considers. But in general I feel that partitionwise join and aggregation deserve separate sections next to "partition pruning" in [1]; It should mention advanced partition matching algorithm as well. Would you be willing to write one and then expand it for the case in the patch?

Tests
-----
The patch adds a testcase for single column partitioning. I think we need to do better like
1. Test for partitioning on expression, multilevel partitioning, advanced partition matching. Those all might just work. Having tests helps us to notice any future breakage.
2. Some negative test case e.g. equijoin clauses with disjunction, with inequality operator, equality operators with operators from different families etc.
3. The testcase added looks artificial. it outputs data that has same value for two columns which is equal to the primary key of the other table - when would somebody do that?. Is there some real life example where this change will be useful?

Code
----
Minor comment for now. It will be better to increment num_equal_pks immediately after setting pk_known_equal[ipk] = true. Otherwise the code gets confusing around line 2269. I will spend more time reviewing the code next week.

[1] https://www.postgresql.org/docs/current/ddl-partitioning.html
[2] https://www.postgresql.org/message-id/CAN_9JTxucGdVY9tV6Uxq0CdhrW98bZtxPKFbF_75qdPi5wBaow@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: remaining sql/json patches
Следующее
От: David Rowley
Дата:
Сообщение: Re: Properly pathify the union planner