Обсуждение: [HACKERS] PoC: full merge join on comparison clause
Hi hackers, As you know, at this time Postgres cannot perform a full join on a comparison clause. For example, if we have two tables with numeric columns and run a query like 'select * from t1 full join t2 on t1.a > t2.a', we get an error: "FULL JOIN is only supported with merge-joinable or hash-joinable join conditions". Such queries are legitimate SQL and sometimes arise when porting applications from different DBMS, so it would be good to support them in Postgres. They can be rewritten as union of right and left joins, but it requires manual work and runs somewhat slower (see the benchmark at the end of the letter). This proof-of-concept patch explores the possibility of performing such queries as merge joins. Consider the following example where outer and inner relations are in ascending order, and we are asked to return outer tuples that are greater than inner. outer > inner outer tuple - 6 4 - marked tuple 7 5 8 6 - inner tuple 8 7 The main difference from normal merge join is that we do not need to advance the marked tuple. This behavior can be implemented with some simple changes to the function that compares inner and outer tuples. However, for the join clause 'outer < inner' we would have to advance the marked tuple, which would require adding a new state to the merge join executor node. We do not do this. Instead, at the path creation stage, we make sure that the particular combination of sorting order and join clause allows us to perform merge join the simple way. The optimizer requires some other changes to support these joins. Currently, it uses the same clauses to generate equivalence classes and to perform merge joins. This patch has to separate these two uses. Clauses that correspond to a btree equality operator are used to construct equivalence classes; the operator families for these clauses are recorded in the 'equivopfamilies' field of RestrictInfo struct. Clauses that correspond to btree equality or comparison are used to perform merge joins, and have their operator families recorded in the 'mergeopfamilies'. The optimizer also has to check whether the particular join clause list can be used for merge join, and ensure that it is compatible with inner/outer path ordering. These checks are performed by 'can_sort_for_mergejoin()' and 'outer_sort_suitable_for_mergejoin()'. There is an important unsolved problem in this patch. When generating index paths for base relations, the optimizer tries to use only one scan direction to limit the number of paths. This direction might not be suitable for a given join clause, and the input path will have to be sorted. We could generate paths for both directions, but this was specifically removed for optimization (SHA 834ddc62 by Tom Lane, 10/27/2007 09:45 AM). For inner joins one would expect the merge join to be slower than the nested loop, because it has more complex code paths, and indeed this can be seen on simple benchmarks (see the end of the letter). Costs should be revised further to reflect this difference. I would be glad to hear your opinion on this approach. Some benchmarks: ===== Full join vs union of left and right joins ======================================== test1=# explain analyze select * from t4 right join t1 on t4.a < t1.a union all select * from t4 left join t1 on t4.a < t1.a where t1.a is null; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Append (cost=809.69..70703.19 rows=3340000 width=8) (actual time=8.336..1195.534 rows=5007546 loops=1) -> Merge Left Join (cost=809.69..34230.49 rows=3333333 width=8) (actual time=8.335..920.442 rows=5007537 loops=1) Merge Cond: (t1.a > t4.a) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 rows=1000 width=4) (actual time=0.027..0.395 rows=1001 loops=1) Heap Fetches: 97 -> Sort (cost=809.39..834.39 rows=10000 width=4) (actual time=8.300..356.821 rows=5007538 loops=1) Sort Key: t4.a Sort Method: quicksort Memory: 931kB -> Seq Scan on t4 (cost=0.00..145.00 rows=10000 width=4) (actual time=0.019..2.533 rows=10000 loops=1) -> Nested Loop Anti Join (cost=0.28..3072.71 rows=6667 width=8) (actual time=4.685..35.421 rows=9 loops=1) -> Seq Scan on t4 t4_1 (cost=0.00..145.00 rows=10000 width=4) (actual time=0.010..0.656 rows=10000 loops=1) -> Index Only Scan using idx_t1_a on t1 t1_1 (cost=0.28..6.10 rows=333 width=4) (actual time=0.003..0.003 rows=1 loops=10000) Index Cond: (a > t4_1.a) Heap Fetches: 971 Planning time: 1.414 ms Execution time: 1324.985 ms (16 rows) test1=# explain analyze select * from t4 full join t1 on t4.a < t1.a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Merge Full Join (cost=809.66..34230.49 rows=3333333 width=8) (actual time=8.351..914.590 rows=5007546 loops=1) Merge Cond: (t1.a > t4.a) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 rows=1000 width=4) (actual time=0.035..0.368 rows=1001 loops=1) Heap Fetches: 97 -> Sort (cost=809.39..834.39 rows=10000 width=4) (actual time=8.309..347.851 rows=5007546 loops=1) Sort Key: t4.a Sort Method: quicksort Memory: 931kB -> Seq Scan on t4 (cost=0.00..145.00 rows=10000 width=4) (actual time=0.020..2.563 rows=10000 loops=1) Planning time: 1.083 ms Execution time: 1044.869 ms (10 rows) === Merge vs nested loop =========================================== test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.28..944713.00 rows=33333333 width=8) (actual time=0.055..8718.840 rows=50014145 loops=1) -> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.019..6.428 rows=100000 loops=1) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..6.10 rows=333 width=4) (actual time=0.003..0.050 rows=500 loops=100000) Index Cond: (a >= t5.a) Heap Fetches: 9147995 Planning time: 2.209 ms Execution time: 9942.176 ms (7 rows) test1=# set enable_mergejoin TO on; SET test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=9748.54..343618.88 rows=33333333 width=8) (actual time=35.491..9281.482 rows=50014145 loops=1) Merge Cond: (t1.a >= t5.a) -> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27 rows=1000 width=4) (actual time=0.027..0.769 rows=1001 loops=1) Heap Fetches: 97 -> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual time=35.458..3906.652 rows=50014145 loops=1) Sort Key: t5.a Sort Method: quicksort Memory: 8541kB -> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.013..8.570 rows=100000 loops=1) Planning time: 2.368 ms Execution time: 10530.356 ms (10 rows) -- Alexander Kuzmenkov Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Fri, May 12, 2017 at 7:09 AM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > As you know, at this time Postgres cannot perform a full join on a > comparison clause. For example, if we have two tables with numeric columns > and run a query like 'select * from t1 full join t2 on t1.a > t2.a', we get > an error: "FULL JOIN is only supported with merge-joinable or hash-joinable > join conditions". Such queries are legitimate SQL and sometimes arise when > porting applications from different DBMS, so it would be good to support > them in Postgres. They can be rewritten as union of right and left joins, > but it requires manual work and runs somewhat slower (see the benchmark at > the end of the letter). This proof-of-concept patch explores the possibility > of performing such queries as merge joins. Interesting. I suggest adding this to the next CommitFest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 16.05.2017 18:57, Robert Haas wrote: > Interesting. I suggest adding this to the next CommitFest. Thank you, added: https://commitfest.postgresql.org/14/1141/ -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Here is a new version of the patch, rebased to 749c7c41 and with some cosmetic changes. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi Alexander, On Fri, Aug 25, 2017 at 10:11 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > Here is a new version of the patch, rebased to 749c7c41 and with some > cosmetic changes. > I looked at this patch briefly. This is a useful feature. This isn't a design level review of the patch. I may get back to that later. But here are some assorted comments The patch applies cleanly, but there are some whitespace errors. src/backend/executor/nodeMergejoin.c:231: trailing whitespace. + /* src/backend/executor/nodeMergejoin.c:232: trailing whitespace. + * Determine whether we accept lesser and/or equal tuples src/backend/optimizer/path/joinpath.c:499: trailing whitespace. + * a merge join. A merge join executor can only choose inner values that are src/backend/optimizer/path/joinpath.c:501: trailing whitespace. + * have at most one non-equality clause. The implementation may change, so fixing the white space errors may not be priority now. The patch compiles cleanly. You have renamed RestrictInfo member mergeopfamilies as equivopfamilies. I don't think that's a good name; it doesn't convey that these are opfamilies containing merge operators. The changes in check_mergejoinable() suggest that an operator may act as equality operator in few operator families and comparison operator in others. That looks odd. Actually an operator family contains operators other than equality operators, so you may want to retain this member and add a new member to specify whether the clause is an equality clause or not. In mergejoinscansel() you have just removed Assert(op_strategy == BTEqualStrategyNumber); Probably this function is written considering on equality operators. But now that we are using this for all other operators, we will need more changes to this function. That may be the reason why INNER join in your earlier example doesn't choose right costing. The comment change in final_cost_mergejoin() needs more work. n1, n2, n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 + n2 + n3 + ... = size of inner relation is correct. In that context I am not able to understand your change + * If the merge clauses contain inequality, (n1 + n2 + ...) ~= + * (size of inner relation)^2. Some stylistic comments + switch (join_op_strategy) + { + case BTEqualStrategyNumber: + parent->mj_UseEqual[iClause] = true; + break; + case BTLessEqualStrategyNumber: + parent->mj_UseEqual[iClause] = true; + /* fall through */ + case BTLessStrategyNumber: + parent->mj_UseLesser[iClause] = true; + break; + case BTGreaterEqualStrategyNumber: + parent->mj_UseEqual[iClause] = true; + /* fall through */ + case BTGreaterStrategyNumber: + parent->mj_UseLesser[iClause] = true; + break; + default: + Assert(false); Add blank lines between different cases and you may want to replace Assert in default case into an elog(). See for example exprType() or get_jointype_name(). + if (sort_result < 0) + { + result = MJCR_NextOuter; + } We usually do not add {} around a single statement block. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 19 Sep 2017, at 15:18, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > > On Fri, Aug 25, 2017 at 10:11 PM, Alexander Kuzmenkov > <a.kuzmenkov@postgrespro.ru> wrote: >> Here is a new version of the patch, rebased to 749c7c41 and with some >> cosmetic changes. >> > > I looked at this patch briefly. This is a useful feature. This isn't a > design level review of the patch. I may get back to that later. But > here are some assorted comments > .. Looking forward to further review on this patch, but based on this feedback I’m moving this to Waiting for author. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi Ashutosh,
Thanks for the review.
Jeff, I'm copying you because this is relevant to our discussion about what to do with mergeopfamilies when adding new merge join types.
You have renamed RestrictInfo member mergeopfamilies as equivopfamilies. I don't think that's a good name; it doesn't convey that these are opfamilies containing merge operators. The changes in check_mergejoinable() suggest that an operator may act as equality operator in few operator families and comparison operator in others. That looks odd. Actually an operator family contains operators other than equality operators, so you may want to retain this member and add a new member to specify whether the clause is an equality clause or not.
For mergeopfamilies, I'm not sure what is the best thing to do. I'll try to explain my understanding of the situation, please correct me if I'm wrong.
Before the patch, mergeopfamilies was used for two things: creating equivalence classes and performing merge joins.
For equivalence classes: we look at the restriction clauses, and if they have mergeopfamilies set, it means that these clause are based on an equality operator, and the left and right variables must be equal. To record this fact, we create an equivalence class. The variables might be equal for one equality operator and not equal for another, so we record the particular operator families to which our equality operator belongs.
For merge join: we look at the join clauses, and if they have mergeopfamilies set, it means that these clauses are based on an equality operator, and we can try performing this particular join as merge join. These opfamilies are also used beforehand to create the equivalence classes for left and right variables. The equivalence classes are used to match the join clauses to pathkeys describing the ordering of join inputs.
So, if we want to start doing merge joins for operators other than equality, we still need to record their opfamilies, but only use them for the second case and not the first. I chose to put these opfamilies to different variables, and
name the one used for equivalence classes 'equivopfamilies' and the one used for merge joins 'mergeopfamilies'. The equality operators are used for both cases, so we put their opfamilies into both of these variables.
I agree this might look confusing. Indeed, we could keep a single variable for opfamilies, and add separate flags that show how they can be used, be that for equivalence classes, merge joins, range joins or some combination of them. This is similar to what Jeff did in his range merge join patch [1]. I will think more about this and try to produce an updated patch.
In mergejoinscansel() you have just removed Assert(op_strategy == BTEqualStrategyNumber); Probably this function is written considering on equality operators. But now that we are using this for all other operators, we will need more changes to this function. That may be the reason why INNER join in your earlier example doesn't choose right costing.
I changed mergejoinscansel() slightly to reflect the fact that the inner relation is scanned from the beginning if we have an inequality merge clause.
The comment change in final_cost_mergejoin() needs more work. n1, n2, n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 + n2 + n3 + ... = size of inner relation is correct. In that context I am not able to understand your change + * If the merge clauses contain inequality, (n1 + n2 + ...) ~= + * (size of inner relation)^2.
I extended the comment in final_cost_mergejoin(). Not sure if that approximation makes any sense, but this is the best I could think of.
Style problems are fixed.
Attached please find the new version of the patch that addresses all the review comments except mergeopfamilies.
The current commitfest is ending, but I'd like to continue working on this patch, so I am moving it to the next one.
[1] https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com#CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com
-- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Thu, Sep 28, 2017 at 8:57 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > Hi Ashutosh, > > Thanks for the review. > > Jeff, I'm copying you because this is relevant to our discussion about what > to do with mergeopfamilies when adding new merge join types. > > You have renamed RestrictInfo member mergeopfamilies as > equivopfamilies. I don't think that's a good name; it doesn't convey > that these are opfamilies containing merge operators. The changes in > check_mergejoinable() suggest that an operator may act as equality > operator in few operator families and comparison operator in others. > That looks odd. Actually an operator family contains operators other > than equality operators, so you may want to retain this member and add > a new member to specify whether the clause is an equality clause or > not. > > > For mergeopfamilies, I'm not sure what is the best thing to do. I'll try to > explain my understanding of the situation, please correct me if I'm wrong. > > Before the patch, mergeopfamilies was used for two things: creating > equivalence classes and performing merge joins. > > For equivalence classes: we look at the restriction clauses, and if they > have mergeopfamilies set, it means that these clause are based on an > equality operator, and the left and right variables must be equal. To record > this fact, we create an equivalence class. The variables might be equal for > one equality operator and not equal for another, so we record the particular > operator families to which our equality operator belongs. > > For merge join: we look at the join clauses, and if they have > mergeopfamilies set, it means that these clauses are based on an equality > operator, and we can try performing this particular join as merge join. > These opfamilies are also used beforehand to create the equivalence classes > for left and right variables. The equivalence classes are used to match the > join clauses to pathkeys describing the ordering of join inputs. > > So, if we want to start doing merge joins for operators other than equality, > we still need to record their opfamilies, but only use them for the second > case and not the first. I chose to put these opfamilies to different > variables, and > name the one used for equivalence classes 'equivopfamilies' and the one used > for merge joins 'mergeopfamilies'. The equality operators are used for both > cases, so we put their opfamilies into both of these variables. > > I agree this might look confusing. Indeed, we could keep a single variable > for opfamilies, and add separate flags that show how they can be used, be > that for equivalence classes, merge joins, range joins or some combination > of them. This is similar to what Jeff did in his range merge join patch [1]. > I will think more about this and try to produce an updated patch. > I think we have (ab?)used mergeopfamilies to indicate equality condition, which needs some changes. May be these two patches are where we can do those changes. > > In mergejoinscansel() you have just removed Assert(op_strategy == > BTEqualStrategyNumber); Probably this function is written considering > on equality operators. But now that we are using this for all other > operators, we will need more changes to this function. That may be the > reason why INNER join in your earlier example doesn't choose right > costing. > > > I changed mergejoinscansel() slightly to reflect the fact that the inner > relation is scanned from the beginning if we have an inequality merge > clause. > > > The comment change in final_cost_mergejoin() needs more work. n1, n2, > n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 + > n2 + n3 + ... = size of inner relation is correct. In that context I > am not able to understand your change > + * If the merge clauses contain inequality, (n1 + n2 + ...) ~= > + * (size of inner relation)^2. > > > I extended the comment in final_cost_mergejoin(). Not sure if that > approximation makes any sense, but this is the best I could think of. > > Style problems are fixed. > > Attached please find the new version of the patch that addresses all the > review comments except mergeopfamilies. > > The current commitfest is ending, but I'd like to continue working on this > patch, so I am moving it to the next one. > > Thanks for working on the comments. I am interested to continue reviewing it in the next commitfest. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
As discussed earlier, I changed the way we work with mergeopfamilies. I use the "is_equality" flag to indicate whether the clause is an equality one, and fill mergeopfamilies for both equality and inequality operators. The updated patch is attached (rebased to 20b6552242). -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi Alexander, Commit c7a9fa399 has added another test on mergeopfamilies. I think your patch will need to take care of that test. On Wed, Oct 4, 2017 at 6:38 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > As discussed earlier, I changed the way we work with mergeopfamilies. I use > the "is_equality" flag to indicate whether the clause is an equality one, > and fill mergeopfamilies for both equality and inequality operators. > The updated patch is attached (rebased to 20b6552242). > > > -- > Alexander Kuzmenkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi, I am attaching the updated patch, rebased to 820c03. On 09.10.2017 13:47, Ashutosh Bapat wrote: > Hi Alexander, > Commit c7a9fa399 has added another test on mergeopfamilies. I think > your patch will need to take care of that test. > > On Wed, Oct 4, 2017 at 6:38 PM, Alexander Kuzmenkov > <a.kuzmenkov@postgrespro.ru> wrote: >> As discussed earlier, I changed the way we work with mergeopfamilies. I use >> the "is_equality" flag to indicate whether the clause is an equality one, >> and fill mergeopfamilies for both equality and inequality operators. >> The updated patch is attached (rebased to 20b6552242). >> >> >> -- >> Alexander Kuzmenkov >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> > > -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Mon, Oct 30, 2017 at 9:25 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > I am attaching the updated patch, rebased to 820c03. (Please avoid top-posting) This patch has rotten and conflicts with recent changes in joinrels.c. This did not get any reviews, so I am moving it to next CF with "waiting on author" as status. -- Michael
Here is the patch rebased to a852cfe9. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
Greetings Alexander, * Alexander Kuzmenkov (a.kuzmenkov@postgrespro.ru) wrote: > Here is the patch rebased to a852cfe9. Thanks for updating it. This would definitely be nice to have. Ashutosh, thanks for your previous review, would you have a chance to look at it again? Would be great to at least get this to ready for committer before the end of this CF. Thanks! Stephen
Вложения
On Tue, Jan 23, 2018 at 10:01 PM, Stephen Frost <sfrost@snowman.net> wrote: > Greetings Alexander, > > * Alexander Kuzmenkov (a.kuzmenkov@postgrespro.ru) wrote: >> Here is the patch rebased to a852cfe9. > > Thanks for updating it. This would definitely be nice to have. > Ashutosh, thanks for your previous review, would you have a chance to > look at it again? Would be great to at least get this to ready for > committer before the end of this CF. The patch contains new code and also refactors some existing code. May be it's better to separate these two into separate patches so that it's easy to review patches. There's lot of executor code, which I don't understand myself. So, I won't be able to complete the review in this CF. Sorry. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 29.01.2018 08:40, Ashutosh Bapat wrote: > Maybe it's better to separate these two into separate patches so that > it's easy to review patches. OK, I'll try doing this. For now, moving the patch entry to the next commitfest. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Here are some updates on this patch. I split it into two parts. The preparatory part contains some mechanical changes to prepare for the main part. Most importantly, a new field is added, `RestrictInfo.is_mj_equality`. It is a marker of mergejoinable equality clauses, and `RestrictInfo.mergeopfamilies` is a more general marker of clauses that are mergejoinable but not necessarily equality. The usages are changed accordingly. The main part consists of executor and planner changes required to support inequality merge joins. The executor changes are as described in the original post. The planner part has changed significantly since the last version. It used to apply some shady hacks to ensure we have the required sort orders of inner and outer paths. Now I think I found a reasonable way to generate the pathkeys we need. When we sort outer relation in `sort_inner_and_outer()`, the pathkeys are generated by `select_outer_pathkeys_for_merge()`. When we use the pathkeys we already have for the outer relation in `match_unsorted_outer()`, mergeclauses are selected by `find_mergeclauses_for_pathkeys()`. I changed these functions to select the right pathkey direction for merge clauses, and also ensure that we only have a single inequality merge clause and it is the last one. Also, to use the index paths, I changed `pathkeys_useful_for_merging()` to keep both pathkey directions for inequality merge clauses. Some basic joins work, but I couldn't properly test all the corner cases with different orderings, because they depend on a bug in vanilla merge joins [1]. To sum up, the preparatory and executor changes are stable, and the planner part is WIP. 1. https://www.postgresql.org/message-id/flat/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On 22.02.2018 21:42, Alexander Kuzmenkov wrote: > > Some basic joins work, but I couldn't properly test all the corner > cases with different orderings, because they depend on a bug in > vanilla merge joins [1]. > The bug was fixed, so here is the rebased patch. The planner part of the patch is stable now and can be reviewed, too. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Fri, Mar 2, 2018 at 8:02 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > On 22.02.2018 21:42, Alexander Kuzmenkov wrote: >> >> >> Some basic joins work, but I couldn't properly test all the corner cases >> with different orderings, because they depend on a bug in vanilla merge >> joins [1]. >> > > The bug was fixed, so here is the rebased patch. The planner part of the > patch is stable now and can be reviewed, too. > Both the patches are named 01. Their names tell the order in which they need to be applied, so it's ok for these patches. But creating such patches using git format-patch (with -v as some suggest) really helps in general. All you need to do is prepare commits in your repository, one per patch, including changes in each patch in separate commits and then run git format-patch on that repository. I use git format-patch @{upstream}, but there are other ways also. Then you can use git rebase to rebase your patches periodically. If you are already doing that, sorry for the noise. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 05.03.2018 08:30, Ashutosh Bapat wrote: > But creating such patches using git format-patch (with -v as some suggest) really > helps in general. > Thanks for the advice. I heard about this workflow, but never used it myself. Perhaps it's time to try it. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, I have started reviewing these patches. I haven't grasped the design yet. But here are some comments on the first patch. - clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); + parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); crosses 80 characters. - StrategyNumber opstrategy = mergestrategies[iClause]; + StrategyNumber sort_strategy = mergestrategies[iClause]; - int op_strategy; + int join_strategy; I don't see a reason why should we change the name of variable here. These are operator strategies and there's no need to change their names. The name change is introducing unnecessary diffs. + clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent); + clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent); cross 80 characters. /* @@ -378,20 +375,29 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) return result; } +/* Tuple comparison result */ +typedef enum +{ + MJCR_NextInner = 1, + MJCR_NextOuter = -1, + MJCR_Join = 0 +} MJCompareResult; + /* * MJCompare * - * Compare the mergejoinable values of the current two input tuples - * and return 0 if they are equal (ie, the mergejoin equalities all - * succeed), >0 if outer > inner, <0 if outer < inner. + * Compare the mergejoinable values of the current two input tuples. + * If they are equal, i.e., the mergejoin equalities all succeed, + * return MJCR_Join, if outer > inner, MJCR_NextInner, and else + * MJCR_NextOuter. * * MJEvalOuterValues and MJEvalInnerValues must already have been called * for the current outer and inner tuples, respectively. */ -static int +static MJCompareResult MJCompare(MergeJoinState *mergestate) { I am not sure about this change as well. MJCompare()'s job is to compare given keys in the two tuples and return the comparison result. The result was used as it is to decide which side to advance in an equality based merge join. But for inequality based merge join the result needs to be interpreted further. I think we should write a wrapper around MJCompare which interprets the result rather than changing MJCompare itself. OR at least change the name of MJCompare. The first option is better in case we use MJCompare for purposes other than merge join in future. I am not sure what those could be, but say a merge based union or something like that. /* * Sweep through the existing EquivalenceClasses looking for matches to @@ -273,7 +274,7 @@ process_equivalence(PlannerInfo *root, /* * A "match" requires matching sets of btree opfamilies. Use of * equal() for this test has implications discussed in the comments - * for get_mergejoin_opfamilies(). + * for get_equality_opfamilies(). I think we should leave mergejoin word in there or at least indicate that these are btree opfamilies so that we don't confuse it with hash equality operator families. It will be good if you can write something about why these changes are required in the file. If you are using git format-patch, you could write a commit message that gets added to the patch. That way, it leaves there for anybody to review. I am having a difficult time reading the next patch. There are various changes in the second patch, which I don't understand the reason behind. I think some comments will help, in as commit message or in the code. I will continue reviewing the patches. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > > I will continue reviewing the patches. > Here are some more review comments - * sort ordering for each merge key. The mergejoinable operator is an - * equality operator in the opfamily, and the two inputs are guaranteed to be + * sort ordering for each merge key. The mergejoinable operator is a + * comparison operator in the opfamily, and the two inputs are guaranteed to be I think this prologue has to change substantially. At the beginning of the prologue it explicitly mentions clauses like leftexpr = rightexpr. That needs to be changed. * ordered in either increasing or decreasing (respectively) order according It looks like the order of inputs is constrained by the in-equality operator. That too needs to be specified here. * This allows us to obtain the needed comparison function from the opfamily. @@ -200,6 +200,9 @@ MJExamineQuals(List *mergeclauses, Oid op_righttype; Oid sortfunc; + if (parent->mj_Ineq_Present) + elog(ERROR, "inequality mergejoin clause must be the last one"); + IIUC, this never happens. If it really happens, we have created a path which can not be used practically. That should never happen. It will help to add a comment here clarifying that situation. + bool have_inequality = have_inequality_mergeclause(mergeclauses); There will be many paths created with different ordering of pathkeys. So, instead of calling have_inequality_mergeclause() for each of those paths, it's better to save this status in the path itself when creating the path. /* Make a pathkey list with this guy first */ if (l != list_head(all_pathkeys)) + { + if (have_inequality && l == list_tail(all_pathkeys)) + /* Inequality merge clause must be the last, we can't move it */ + break; + I am kind of baffled by this change. IIUC the way we create different orderings of pathkeys here, we are just rotating the pathkeys in circular order. This means there is exactly one ordering of pathkeys where the pathkey corresponding to the inequality clause is the last one. It's only that ordering which will be retained and all other ordering will be discarded. Instead of that, I think we should keep the pathkey corresponding to the inequality clause at the end (or track in separately) and create different orderings of pathkeys by rotating other pathkeys. This will allow us to cost the orderings as intended by this fucntion. /* Might have no mergeclauses */ if (nClauses == 0) return NIL; + { + List *ineq_clauses = find_inequality_clauses(mergeclauses); + + if (list_length(ineq_clauses) > 1) + return NIL; Without this patch, when there is an inequality clause with FULL JOIN, we will not create a merge join path because select_mergejoin_clauses() will set mergejoin_allowed to false. This means that we won't call sort_inner_and_outer(). I think this patch also has to do the same i.e. when there are more than one inequality clauses, select_mergejoin_clauses() should set mergejoin_allowed to false in case of a FULL JOIN since merge join machinary won't be able to handle that case. If we do that, we could arrange extra.mergeclause_list such that the inequality clause is always at the end thus finding inequality clause would be easy. Again, this is not full review, but I am diving deeper into the patch-set and understanding it better. Sorry. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 07/09/2018 04:12 PM, Ashutosh Bapat wrote: > On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> I will continue reviewing the patches. >> > Here are some more review comments Ashutosh, Many thanks for the review, I'm glad that we are continuing with this patch. I'm working on your comments now, will post the updated version this week. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Jul 10, 2018 at 12:05 AM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > On 07/09/2018 04:12 PM, Ashutosh Bapat wrote: >> >> On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> >>> I will continue reviewing the patches. >>> >> Here are some more review comments > > > > Ashutosh, > > Many thanks for the review, I'm glad that we are continuing with this patch. > I'm working on your comments now, will post the updated version this week. While updating the patches, please consider adding some comments as to why only single inequality clause supported. I didn't see comments in the patch explaining that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
I tried to fix the things you mentioned and improve the comments. Among other changes, there is now a description of how merge join works with inequalities at the top of nodeMergejoin.c. It also explains why we only support one inequality clause. Some particular points: On 07/06/2018 04:01 PM, Ashutosh Bapat wrote: > - StrategyNumber opstrategy = mergestrategies[iClause]; > + StrategyNumber sort_strategy = mergestrategies[iClause]; > - int op_strategy; > + int join_strategy; > I don't see a reason why should we change the name of variable here. These are > operator strategies and there's no need to change their names. The name change > is introducing unnecessary diffs. These variables have different meaning but their names differ only with an underscore. When I had to change this function, I made mistakes because of this. I'd keep the descriptive names to avoid further confusion. Should this be a separate patch? > I think we should write a wrapper around MJCompare which interprets the result rather > than changing MJCompare itself. OR at least change the name of MJCompare. Renamed the function to MJTestTuples to reflect that it decides whether we join tuples or advance either side. > - * for get_mergejoin_opfamilies(). > + * for get_equality_opfamilies(). > > I think we should leave mergejoin word in there or at least indicate that these > are btree opfamilies so that we don't confuse it with hash equality operator > families. Renamed these to get_btree_equality_opfamilies() and get_btree_comparison_opfamilies(). > + if (parent->mj_Ineq_Present) > + elog(ERROR, "inequality mergejoin clause must be the last one"); > + > > IIUC, this never happens. If it really happens, we have created a path which > can not be used practically. That should never happen. It will help to add a > comment here clarifying that situation. This is just a cross-check for the planner. Added a comment. We should probably use a separate error code for internal errors as opposed to user errors, but I'm not sure if we have one, I see just elog(ERROR) being used everywhere. > > + bool have_inequality = have_inequality_mergeclause(mergeclauses); > > There will be many paths created with different ordering of pathkeys. So, > instead of calling have_inequality_mergeclause() for each of those paths, it's > better to save this status in the path itself when creating the path. I removed this function altogether, because we can just check the last merge clause. When we cost the path, we already have a proper mergejoinable list of clauses, so if there is an inequality clause, it's the last one. > /* Make a pathkey list with this guy first */ > if (l != list_head(all_pathkeys)) > + { > + if (have_inequality && l == list_tail(all_pathkeys)) > + /* Inequality merge clause must be the last, we can't > move it */ > + break; > + > > I am kind of baffled by this change. IIUC the way we create different orderings > of pathkeys here, we are just rotating the pathkeys in circular order. This > means there is exactly one ordering of pathkeys where the pathkey corresponding > to the inequality clause is the last one. This code does not rotate the pathkeys circularly, but puts each of them in the first position, and keeps the rest in the original order. Say, if we have three equality pathkeys, and one inequality pathkey at the end (let's denote them as E1, E2, E3, IE), the permutations it tries will be like this: E1 E2 E3 IE E2 E1 E3 IE E3 E1 E2 IE Does this sound right? > /* Might have no mergeclauses */ > if (nClauses == 0) > return NIL; > > + { > + List *ineq_clauses = find_inequality_clauses(mergeclauses); > + > + if (list_length(ineq_clauses) > 1) > + return NIL; > > Without this patch, when there is an inequality clause with FULL JOIN, we will > not create a merge join path because select_mergejoin_clauses() will set > mergejoin_allowed to false. This means that we won't call > sort_inner_and_outer(). I think this patch also has to do the same i.e. when > there are more than one inequality clauses, select_mergejoin_clauses() should > set mergejoin_allowed to false in case of a FULL JOIN since merge join > machinary won't be able to handle that case. > > If we do that, we could arrange extra.mergeclause_list such that the inequality > clause is always at the end thus finding inequality clause would be easy. I changed select_mergejoin_clauses() to filter multiple inequality clauses and disable join if needed. Now we can use extra inequalities as join filter, if it's not full join. I didn't reorder extra.mergeclause_list there, because this order is ignored later. select_outer_pathkeys_for_merge() chooses the order of pathkeys using some heuristics, and then find_mergeclauses_for_outer_pathkeys() reorders the clauses accordingly. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Tue, Jul 10, 2018 at 10:06 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > I tried to fix the things you mentioned and improve the comments. Among > other changes, there is now a description of how merge join works with > inequalities at the top of nodeMergejoin.c. It also explains why we only > support one inequality clause. Thanks for the commit messages. I would use word "in-equality" instead of "comparison" since equality is also a comparison. > > Some particular points: > > On 07/06/2018 04:01 PM, Ashutosh Bapat wrote: >> >> - StrategyNumber opstrategy = mergestrategies[iClause]; >> + StrategyNumber sort_strategy = mergestrategies[iClause]; >> - int op_strategy; >> + int join_strategy; >> I don't see a reason why should we change the name of variable here. These >> are >> operator strategies and there's no need to change their names. The name >> change >> is introducing unnecessary diffs. > > > These variables have different meaning but their names differ only with an > underscore. When I had to change this function, I made mistakes because of > this. I'd keep the descriptive names to avoid further confusion. Should this > be a separate patch? No, 0001 suffice. But I am still not sure that the variable name change is worth the trouble. Anyway, will leave this for a committer to judge. > > This is just a cross-check for the planner. Added a comment. We should > probably use a separate error code for internal errors as opposed to user > errors, but I'm not sure if we have one, I see just elog(ERROR) being used > everywhere. elog(ERROR) is fine. Thanks for the comments. - if (op_mergejoinable(opno, exprType(leftarg)) && - !contain_volatile_functions((Node *) clause)) - restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno); + if (!contain_volatile_functions((Node *) clause)) + { + if (op_mergejoinable_equality(opno, exprType(leftarg))) Why is this condition split. Also why is the change in the order of conditions? + { + restrictinfo->mergeopfamilies = get_btree_equality_opfamilies(opno); + restrictinfo->is_mj_equality = true; Comparing this with the original code, I think, is_mj_equality should be true if restrictinfo->mergeopfamilies is not NIL. There is no way that a clause can act as an equality clause when there are no families in which the operator is an equality operator. If restrictinfo->mergeopfamilies can not be NIL here, probably we should add an Assert and a bit of explanation as to why is_mj_equality is true. With this work the meaning of oprcanmerge (See pg_operator catalog and also CREATE OPERATOR syntax) changes. Every btree operator can now be used to perform a merge join. oprcanmerge however only indicates whether an operator is an equality or not. Have you thought about that? Do we require to re-define oprcanmerge? + * + * If the inequality clause is not the last one, or if there are several + * of them, this algorithm doesn't work, because it is not possible to + * sort the inputs in such a way that given an outer tuple, the matching + * inner tuples form a contiguous interval. I think, it should be possible to use this technique with more than one inequality clauses as long as all the operators require the input to be ordered in the same direction and the clauses are ANDed. In that case the for a given outer tuple the matching inner tuples form a contiguous interval. I think it's better to straighten out these things before diving further into the 2nd patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
El 18/07/18 a las 16:58, Ashutosh Bapat escribió: > > Thanks for the commit messages. I would use word "in-equality" instead > of "comparison" since equality is also a comparison. Fixed. > Comparing this with the original code, I think, is_mj_equality should be true > if restrictinfo->mergeopfamilies is not NIL. My mistake, fixed. > With this work the meaning of oprcanmerge (See pg_operator catalog and also > CREATE OPERATOR syntax) changes. Every btree operator can now be used to > perform a merge join. oprcanmerge however only indicates whether an operator is > an equality or not. Have you thought about that? Do we require to re-define > oprcanmerge? For now we can test with old oprcanmerge meaning, not to bump the catalog version. Merge join needs only BTORDER_PROC function, which is required for btree opfamilies. This means that it should be always possible to merge join on operators that correspond to standard btree strategies. We could set oprcanmerge to true for all built-in btree comparison operators, and leave the possibility to disable it for custom operators. > I think, it should be possible to use this technique with more than one > inequality clauses as long as all the operators require the input to be ordered > in the same direction and the clauses are ANDed. In that case the for a given > outer tuple the matching inner tuples form a contiguous interval. Consider a table "t(a int, b int)", the value of each column can be 1, 2, 3, 4 and the table contains all possible combinations. If merge condition is "a < 2 and b < 2", for each of the four possible sorting directions, the result set won't be contiguous. Generally speaking, this happens when we have several groups with the same value of first column, and the first column matches the join condition. But inside each group, for some rows the second column doesn't match. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes: > [ Inequality-merge-join-v10.patch ] Just thinking about this patch a bit ... I wonder why you were so quick to reject the UNION approach at the outset. This patch is pretty messy, and it complicates a lot of stuff that is quite fundamental to the planner, and you still end up that the only functionality gain is now we can handle full joins whose conditions include a single btree inequality clause. Nor are we doing that remarkably efficiently ... it's pretty much impossible to do it efficiently, in fact, since if the inputs have M and N rows respectively then the output will have something like (M*N)/2 rows. So it seems to me that if we're going to put sweat into this area at all, our ambition ought to be "we'll successfully perform a FULL JOIN with any join clause whatsoever, though it might take O(M*N) time". Now as far as I can tell, the UNION substitution you proposed is completely valid, although it'd be better to phrase the second step as an antijoin. That is, I believe select * from t1 full join t2 on (anything) is exactly equal to select t1.*, t2.* from t1 left join t2 on (anything) union all select t1.*, t2.* from t2 anti join t1 on (anything) There is one fly in the ointment, which is that we will have to run the join clause twice, so it can't contain volatile functions --- but the merge join approach wouldn't handle that case either. Having to read the inputs twice is not good, but we could put them into CTEs, which fixes any problems with volatility below the join and at least alleviates the performance problem. Since we can't currently do any meaningful qual pushdown through full joins, the optimization-fence aspect of a CTE doesn't seem like an issue either. In short, proceeding like the above when we can't find another plan type for a full join seems like it fixes a far wider variety of cases. The possibility that maybe we could do some of those cases a bit faster isn't sufficiently attractive to me to justify also putting in a mechanism like this patch proposes. We only rarely see complaints at all about can't-do-a-full-join problems, and I do not think this patch would fix enough of those complaints to be worthwhile. regards, tom lane
On 11/19/18 04:46, Tom Lane wrote: > In short, proceeding like the above when we can't find another plan > type for a full join seems like it fixes a far wider variety of cases. > The possibility that maybe we could do some of those cases a bit faster > isn't sufficiently attractive to me to justify also putting in a > mechanism like this patch proposes. We only rarely see complaints at > all about can't-do-a-full-join problems, and I do not think this patch > would fix enough of those complaints to be worthwhile. I agree, the automated UNION substitutions seems to be a better approach. I'll mark this patch as rejected then. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company