Обсуждение: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Hi, This patch is a WIP fix for the issue described in [1], where the planner picks a more expensive plan with partition-wise joins enabled, and disabling this option produces a cheaper plan. That's a bit strange because with the option disabled we consider *fewer* plans, so we should not be able to generate a cheaper plan. The problem lies in generate_orderedappend_paths(), which builds two types of append paths - with minimal startup cost, and with minimal total cost. That however does not work for queries with LIMIT, which also need to consider at fractional cost, but the path interesting from this perspective may be eliminated by other paths. Consider three paths (this comes from the reproducer shared in [1]): A: nestloop_path startup 0.585000 total 35708.292500 B: nestloop_path startup 0.292500 total 150004297.292500 C: mergejoin_path startup 9748.112737 total 14102.092737 With some reasonable LIMIT value (e.g. 5% of the data), we really want to pick path A. But that path is dominated both in startup cost (by B) and total cost (path C). Hence generate_orderedappend_paths() will ignore it, and we'll generate a more expensive LIMIT plan. In [2] Tom proposed to modify generate_orderedappend_paths() to also consider the fractional cheapest_path, just like we do for startup and total costs. The attached patch does exactly that, and it does seem to do the trick. There are some loose ends, though: 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not clear whether to default to cheapest_startup or cheapest_total. We might also consider an incremental sort, in which case the startup cost matters (for Sort it does not). So we may need an enhanced version of get_cheapest_fractional_path_for_pathkeys that generates such paths. 2) Same for the cheapest_total - maybe there's a partially sorted path, and using it with incremental sort on top would be better than using cheapest_total_path + sort. 3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry about require_parallel_safe, just like the other functions nearby. I'd argue that this patch does not need to add the Incremental Sort capabilities in (1) and (2) - it's just another place where we decided not to consider this sort variant yet. I'm not sure how much this depends on partition-wise join, and why disabling it generates the right plan. The reproducer uses that, but AFAICS generate_orderedappend_paths() works like this since 2010 (commit 11cad29c915). I'd bet the issue exists since then and it's possible to get similar cases even without partition-wise join. I can reproduce it on PostgreSQL 12, though (which however supports partition-wise join). Not sure whether this should be backpatched. We didn't get any reports until now, so it doesn't seem like a pressing issue. OTOH most users won't actually notice this, they'll just get worse plans without realizing there's a better option. regards [1] https://www.postgresql.org/message-id/011937a3-7427-b99f-13f1-c07a127cf94c%40enterprisedb.com [2] https://www.postgresql.org/message-id/4006636.1618577893%40sss.pgh.pa.us -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi,
thanks for looking into it!
For some reason the patch doesn't apply at my end, could you repost one based at the master?
> clear whether to default to cheapest_startup or cheapest_total. We might
> also consider an incremental sort, in which case the startup cost
> matters (for Sort it does not). So we may need an enhanced version of
> get_cheapest_fractional_path_for_pathkeys that generates such paths.
>
> 2) Same for the cheapest_total - maybe there's a partially sorted path,
> and using it with incremental sort on top would be better than using
> cheapest_total_path + sort.
> capabilities in (1) and (2) - it's just another place where we decided
> not to consider this sort variant yet.
>3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry
Regards
Arne
Sent: Saturday, April 17, 2021 1:52:19 AM
To: pgsql-hackers
Subject: PATCH: generate fractional cheapest paths in generate_orderedappend_path
This patch is a WIP fix for the issue described in [1], where the
planner picks a more expensive plan with partition-wise joins enabled,
and disabling this option produces a cheaper plan. That's a bit strange
because with the option disabled we consider *fewer* plans, so we should
not be able to generate a cheaper plan.
The problem lies in generate_orderedappend_paths(), which builds two
types of append paths - with minimal startup cost, and with minimal
total cost. That however does not work for queries with LIMIT, which
also need to consider at fractional cost, but the path interesting from
this perspective may be eliminated by other paths.
Consider three paths (this comes from the reproducer shared in [1]):
A: nestloop_path startup 0.585000 total 35708.292500
B: nestloop_path startup 0.292500 total 150004297.292500
C: mergejoin_path startup 9748.112737 total 14102.092737
With some reasonable LIMIT value (e.g. 5% of the data), we really want
to pick path A. But that path is dominated both in startup cost (by B)
and total cost (path C). Hence generate_orderedappend_paths() will
ignore it, and we'll generate a more expensive LIMIT plan.
In [2] Tom proposed to modify generate_orderedappend_paths() to also
consider the fractional cheapest_path, just like we do for startup and
total costs. The attached patch does exactly that, and it does seem to
do the trick.
There are some loose ends, though:
1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not
clear whether to default to cheapest_startup or cheapest_total. We might
also consider an incremental sort, in which case the startup cost
matters (for Sort it does not). So we may need an enhanced version of
get_cheapest_fractional_path_for_pathkeys that generates such paths.
2) Same for the cheapest_total - maybe there's a partially sorted path,
and using it with incremental sort on top would be better than using
cheapest_total_path + sort.
3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry
about require_parallel_safe, just like the other functions nearby.
I'd argue that this patch does not need to add the Incremental Sort
capabilities in (1) and (2) - it's just another place where we decided
not to consider this sort variant yet.
I'm not sure how much this depends on partition-wise join, and why
disabling it generates the right plan. The reproducer uses that, but
AFAICS generate_orderedappend_paths() works like this since 2010 (commit
11cad29c915). I'd bet the issue exists since then and it's possible to
get similar cases even without partition-wise join.
I can reproduce it on PostgreSQL 12, though (which however supports
partition-wise join).
Not sure whether this should be backpatched. We didn't get any reports
until now, so it doesn't seem like a pressing issue. OTOH most users
won't actually notice this, they'll just get worse plans without
realizing there's a better option.
regards
[1]
https://www.postgresql.org/message-id/011937a3-7427-b99f-13f1-c07a127cf94c%40enterprisedb.com
[2] https://www.postgresql.org/message-id/4006636.1618577893%40sss.pgh.pa.us
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
I haven't tested the parallel case, but I think we should sort out (3) get_cheapest_fractional_path_for_pathkeys as mentioned above.
I am lost about the comment regarding startup_new_fractional. Could you elaborate what you mean by that?
Apart from that, I'd argue for a small test case. I attached a slimmed down case of what we were trying to fix. It might be worth to integrate that with an existing test, since more than a third of the time seems to be consumed by the creation and attachment of partitions.
Regards
Arne
Вложения
Hi, On 6/3/21 7:17 PM, Arne Roland wrote: > Hi, > > > I haven't tested the parallel case, but I think we should sort out (3) > get_cheapest_fractional_path_for_pathkeys as mentioned above. > Not sure what you refer to by "above" - it's probably better to reply in-line to existing message, which makes it much cleared. > > I am lost about the comment regarding startup_new_fractional. Could you > elaborate what you mean by that? > Not sure what this refers to either - there's no startup_new_fractional in my message and 'git grep startup_new_fractional' returns nothing. > > Apart from that, I'd argue for a small test case. I attached a slimmed > down case of what we were trying to fix. It might be worth to integrate > that with an existing test, since more than a third of the time seems to > be consumed by the creation and attachment of partitions. > Maybe, if there's a suitable table to reuse, we can do that. But I don't think it matters it takes ~1/3 of the time to attach the partitions. What's more important is whether it measurably slows down the test suite, and I don't think that's an issue. In any case, this seems a bit premature - we need something to test the patch etc. We can worry about how expensive the test is much later. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
On 6/3/21 7:17 PM, Arne Roland wrote:
> Hi,
>
>
> I haven't tested the parallel case, but I think we should sort out (3)
> get_cheapest_fractional_path_for_pathkeys as mentioned above.
>
Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.
>
> I am lost about the comment regarding startup_new_fractional. Could you
> elaborate what you mean by that?
>
Not sure what this refers to either - there's no startup_new_fractional
in my message and 'git grep startup_new_fractional' returns nothing.
>
> Apart from that, I'd argue for a small test case. I attached a slimmed
> down case of what we were trying to fix. It might be worth to integrate
> that with an existing test, since more than a third of the time seems to
> be consumed by the creation and attachment of partitions.
>
Maybe, if there's a suitable table to reuse, we can do that. But I don't
think it matters it takes ~1/3 of the time to attach the partitions.
What's more important is whether it measurably slows down the test
suite, and I don't think that's an issue.
In any case, this seems a bit premature - we need something to test the
patch etc. We can worry about how expensive the test is much later.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List *paths,
src/backend/optimizer/plan/planagg.c: get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
On Thu, Jun 3, 2021 at 11:12 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:Hi,
On 6/3/21 7:17 PM, Arne Roland wrote:
> Hi,
>
>
> I haven't tested the parallel case, but I think we should sort out (3)
> get_cheapest_fractional_path_for_pathkeys as mentioned above.
>
Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.
>
> I am lost about the comment regarding startup_new_fractional. Could you
> elaborate what you mean by that?
>
Not sure what this refers to either - there's no startup_new_fractional
in my message and 'git grep startup_new_fractional' returns nothing.
>
> Apart from that, I'd argue for a small test case. I attached a slimmed
> down case of what we were trying to fix. It might be worth to integrate
> that with an existing test, since more than a third of the time seems to
> be consumed by the creation and attachment of partitions.
>
Maybe, if there's a suitable table to reuse, we can do that. But I don't
think it matters it takes ~1/3 of the time to attach the partitions.
What's more important is whether it measurably slows down the test
suite, and I don't think that's an issue.
In any case, this seems a bit premature - we need something to test the
patch etc. We can worry about how expensive the test is much later.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyHi,In REL_11_STABLE branch, a search revealed the following:src/backend/optimizer/path/pathkeys.c: * get_cheapest_fractional_path_for_pathkeys
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List *paths,
src/backend/optimizer/plan/planagg.c: get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,src/include/optimizer/paths.h:extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,It seems this function has been refactored out in subsequent releases.FYI
On 6/3/21 10:52 PM, Zhihong Yu wrote: > ... > > Sent a bit too soon. > > The above function still exists. > But startup_new_fractional was nowhere to be found. Actually, there are two comments /* XXX maybe we should have startup_new_fractional? */ in the patch I posted - I completely forgot about that. But I think that's a typo, I think - it should be /* XXX maybe we should have startup_neq_fractional? */ and the new flag would work similarly to startup_neq_total, i.e. it's pointless to add paths where startup == fractional cost. At least I think that was the idea when I wrote the patch, it way too long ago. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
thanks for the quick reply!
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, June 3, 2021 20:11
To: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
> I haven't tested the parallel case, but I think we should sort out (3)
> get_cheapest_fractional_path_for_pathkeys as mentioned above.
>
Not sure what you refer to by "above" - it's probably better to reply
in-line to existing message, which makes it much cleared.
I was referring to one message above. I thought the thread was still short enough. Apparently to much time has passed. Sorry, I hope this mail is better. I was referring to my post from April:
From: Arne Roland
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
>3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry
Sent: Thursday, June 3, 2021 22:50
To: Tomas Vondra
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
src/backend/optimizer/path/pathkeys.c:get_cheapest_fractional_path_for_pathkeys(List *paths,
src/backend/optimizer/plan/planagg.c: get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Actually, there are two comments
/* XXX maybe we should have startup_new_fractional? */
in the patch I posted - I completely forgot about that. But I think
that's a typo, I think - it should be
/* XXX maybe we should have startup_neq_fractional? */
and the new flag would work similarly to startup_neq_total, i.e. it's
pointless to add paths where startup == fractional cost.
At least I think that was the idea when I wrote the patch, it way too
long ago.
Sorry, I almost forgot about this myself. I only got reminded upon seeing that again with different queries/tables.
Hi Tomas,
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
* not; but we don't have that today, so it's a waste to consider
* partial paths anywhere in the appendrel unless it's all safe.
* (Child rels visited before this one will be unmarked in
* set_append_rel_pathlist().)
*/
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
> Actually, there are two comments
>
> /* XXX maybe we should have startup_new_fractional? */
>
> in the patch I posted - I completely forgot about that. But I think
> that's a typo, I think - it should be
>
> /* XXX maybe we should have startup_neq_fractional? */
>
> and the new flag would work similarly to startup_neq_total, i.e. it's
> pointless to add paths where startup == fractional cost.
>
> At least I think that was the idea when I wrote the patch, it way too
> long ago.
> Sorry, I almost forgot about this myself. I only got reminded upon seeing that again with different queries/tables.
Afaiac we should add a simple testcase here, like I suggested in 477344d5f17c4a8e95d3a5bb6642718a. Apart from that I am not sure there is work to be done here.
Am I wrong?
Regards
Arne
Sent: Saturday, June 26, 2021 5:50:49 PM
To: Tomas Vondra
Cc: pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Hi Tomas,
Sent: Monday, April 26, 2021 13:00
To: Tomas Vondra; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
* not; but we don't have that today, so it's a waste to consider
* partial paths anywhere in the appendrel unless it's all safe.
* (Child rels visited before this one will be unmarked in
* set_append_rel_pathlist().)
*/
Sent: Thursday, June 3, 2021 22:57
To: Zhihong Yu
Cc: Arne Roland; pgsql-hackers
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
> Actually, there are two comments
>
> /* XXX maybe we should have startup_new_fractional? */
>
> in the patch I posted - I completely forgot about that. But I think
> that's a typo, I think - it should be
>
> /* XXX maybe we should have startup_neq_fractional? */
>
> and the new flag would work similarly to startup_neq_total, i.e. it's
> pointless to add paths where startup == fractional cost.
>
> At least I think that was the idea when I wrote the patch, it way too
> long ago.
> Sorry, I almost forgot about this myself. I only got reminded upon seeing that again with different queries/tables.
Hi, On 12/2/21 15:58, Arne Roland wrote: > Afaiac we should add a simple testcase here, like I suggested in > 477344d5f17c4a8e95d3a5bb6642718a > <https://www.postgresql.org/message-id/477344d5f17c4a8e95d3a5bb6642718a%40index.de>. > Apart from that I am not sure there is work to be done here. > Well, I mentioned three open questions in my first message, and I don't think we've really addressed them yet. We've agreed we don't need to add the incremental sort here, but that leaves 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we default to cheapest_startup or cheapest_total? 2) Should get_cheapest_fractional_path_for_pathkeys worry about require_parallel_safe? I think yes, but we need to update the patch. I'll take a closer look next week, once I get home from NYC, and I'll submit an improved version for the January CF. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
thanks for the reply!
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Thursday, December 2, 2021 20:58
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
> [...]
> Well, I mentioned three open questions in my first message, and I don't
> think we've really addressed them yet. We've agreed we don't need to add
> the incremental sort here, but that leaves
>
>
> 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we
> default to cheapest_startup or cheapest_total?
I think it's reasonable to use cheapest_total like we are doing now. I hardly see any reason to change it.
The incremental sort case you mentioned, seems like the only case that plan might be interesting. If we really want that to happen, we probably should check for that separately, i.e. having startup_fractional. Even though this is a fairly special case as it's mostly relevant for partitionwise joins, I'm still not convinced it's worth the cpu cycles. The point is that in most cases factional and startup_fractional will be the same anyways.
And I suspect, even if they aren't, solving that from an application developers point of view, is in most cases not that difficult. One can usually match the pathkey. I personally had a lot of real world issues with missing fractional paths using primary keys. I think it's worth noting that everything will likely match the partition keys anyways, because otherwise there is no chance of doing a merge append.
If I am not mistaken, in case we end up doing a full sort, the cheapest_total path should be completely sufficient.
> 2) Should get_cheapest_fractional_path_for_pathkeys worry about
> require_parallel_safe? I think yes, but we need to update the patch.
I admit, that such a version of get_cheapest_fractional_path_for_pathkeys could be consistent with other functions. And I was confused about that before. But I am not sure what to use require_parallel_safe for. build_minmax_path doesn't care, they are never parallel safe. And afaict generate_orderedappend_paths cares neither, it considers all plans. For instance when it calls get_cheapest_path_for_pathkeys, it sets require_parallel_safe just to false as well.
> I'll take a closer look next week, once I get home from NYC, and I'll
> submit an improved version for the January CF.
Thank you for your work! The current patch, apart from the comments/regression tests, seems pretty reasonable to me.
Regards
Arne
On 12/10/21 00:51, Arne Roland wrote: > Hi, > > thanks for the reply! > > From: Tomas Vondra <tomas.vondra@enterprisedb.com> > Sent: Thursday, December 2, 2021 20:58 > Subject: Re: PATCH: generate fractional cheapest paths in > generate_orderedappend_path > > [...] > > Well, I mentioned three open questions in my first message, and I don't > > think we've really addressed them yet. We've agreed we don't need to add > > the incremental sort here, but that leaves > > > > > > 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, should we > > default to cheapest_startup or cheapest_total? > > I think it's reasonable to use cheapest_total like we are doing now. I > hardly see any reason to change it. True, it's a reasonable first step. Either we generate the same plan as today (with cheapest_total), or maybe a better one (if we find a cheaper fractional path with matching pathkeys). It's true we could do better, but that's life - it's not like we consider every possible path everywhere. > The incremental sort case you mentioned, seems like the only case that > plan might be interesting. If we really want that to happen, we probably > should check for that separately, i.e. having startup_fractional. Even > though this is a fairly special case as it's mostly relevant for > partitionwise joins, I'm still not convinced it's worth the cpu cycles. > The point is that in most cases factional and startup_fractional will be > the same anyways. I don't think we can simply check for startup_fractional (although, I'm not sure I fully understand what would that be). I think what we should really do in this case is walking all the paths, ensuring it's properly sorted (with either full or incremental sort), and then pick the cheapest fractional path from these sorted paths. But I agree that seems pretty expensive. > And I suspect, even if they aren't, solving that from an application > developers point of view, is in most cases not that difficult. One can > usually match the pathkey. I personally had a lot of real world issues > with missing fractional paths using primary keys. I think it's worth > noting that everything will likely match the partition keys anyways, > because otherwise there is no chance of doing a merge append. Yeah, I think you're right. > If I am not mistaken, in case we end up doing a full sort, the > cheapest_total path should be completely sufficient. > Definitely true. > > 2) Should get_cheapest_fractional_path_for_pathkeys worry about > > require_parallel_safe? I think yes, but we need to update the patch. > > I admit, that such a version of > get_cheapest_fractional_path_for_pathkeys could be consistent with other > functions. And I was confused about that before. But I am not sure what > to use require_parallel_safe for. build_minmax_path doesn't care, they > are never parallel safe. And afaict generate_orderedappend_paths cares > neither, it considers all plans. For instance when it calls > get_cheapest_path_for_pathkeys, it sets require_parallel_safe just to > false as well. > True as well. It's looks a bit strange, but you're right neither place cares about parallel safety. > > I'll take a closer look next week, once I get home from NYC, and I'll > > submit an improved version for the January CF. > > Thank you for your work! The current patch, apart from the > comments/regression tests, seems pretty reasonable to me. > Attached is a cleaned-up patch, with a simple regression test. I'll mark this as RFC and get it committed in a couple days. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Pushed, after clarifying the comments a bit. I also looked into what would it take to consider incremental paths, and in principle it doesn't seem all that complicated. The attached PoC patch extends get_cheapest_fractional_path_for_pathkeys() to optionally build incremental sort on paths if needed. There are two GUCs that make experimenting simpler: * enable_fractional_paths - disables fractional paths entirely, i.e. we get behavior without the part I already pushed * enable_fractional_incremental_paths - disables the incremental sort part added by the attached patch With this, I get this plan (see the test in partitionwise_join.sql) test=# EXPLAIN (COSTS OFF) test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) ORDER BY id1 ASC, id2 ASC LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------ Limit -> Merge Left Join Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2)) -> Append -> Index Only Scan using fract_t0_id1_id2_idx on fract_t0 x_1 -> Incremental Sort Sort Key: x_2.id1, x_2.id2 Presorted Key: x_2.id1 -> Index Scan using fract_t1_pkey on fract_t1 x_2 -> Materialize -> Append -> Incremental Sort Sort Key: y_1.id1, y_1.id2 Presorted Key: y_1.id1 -> Index Scan using fract_t0_pkey on fract_t0 y_1 Index Cond: (id1 = id1) Filter: (id2 = id2) -> Incremental Sort Sort Key: y_2.id1, y_2.id2 Presorted Key: y_2.id1 -> Index Scan using fract_t1_pkey on fract_t1 y_2 Index Cond: (id1 = id1) Filter: (id2 = id2) (23 rows) instead of QUERY PLAN ------------------------------------------------------------------------------ Limit -> Incremental Sort Sort Key: x.id1, x.id2 Presorted Key: x.id1 -> Merge Left Join Merge Cond: (x.id1 = y.id1) Join Filter: (x.id2 = y.id2) -> Append -> Index Scan using fract_t0_pkey on fract_t0 x_1 -> Index Scan using fract_t1_pkey on fract_t1 x_2 -> Materialize -> Append -> Index Scan using fract_t0_pkey on fract_t0 y_1 -> Index Scan using fract_t1_pkey on fract_t1 y_2 (14 rows) i.e. the incremental sorts moved below the merge join (and the cost is lower, but that's not shown here). So that seems reasonable, but there's a couple issues too: 1) Planning works (hence we can run explain), but execution fails because of segfault in CheckVarSlotCompatibility - it gets NULL slot for some reason. I haven't looked into the details, but I'd guess we need to pass a different rel to create_incrementalsort_path, not childrel. 2) enable_partitionwisejoin=on seems to cause some confusion, because it results in picking a different plan with higher cost. But that's clearly not because of this patch. 3) I'm still a bit skeptical about the cost of this implementation - it builds the incrementalsort path, just to throw most of the paths away. It'd be much better to just calculate the cost, which should be much cheaper, and only do the heavy work for the one "best" path. 4) One thing I did not realize before is what pathkeys we even consider here. Imagine you have two tables: CREATE TABLE t1 (a int, b int, primary key (a)); CREATE TABLE t2 (a int, b int, primary key (a)); and query SELECT * FROM t1 JOIN t2 USING (a,b); It seems reasonable to also consider pathkeys (a,b) because that's make e.g. mergejoin much cheaper, right? But no, we'll not do that - we only consider pathkeys that at least one child relation has, so in this case it's just (a). Which means we'll never consider incremental sort for this particular example. It's a bit strange, because it's enough to create index on (a,b) for just one of the relations, and it'll suddenly consider building incremental sort on both sides. I don't plan to pursue this further at this point, so I pushed the first part because that's a fairly simple improvement over what we have now. But it seems like a fairly promising area for improvements. Also, the non-intuitive effects of enabling partition-wise joins (i.e. picking plans with higher estimated costs) is something worth exploring, I guess. But that's a separate issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
FWIW this is now marked as committed. I've created a separate entry in the next CF for the incremental sort part. https://commitfest.postgresql.org/37/3513/ regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> From: Tomas Vondra <tomas.vondra@enterprisedb.com>
> Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
>
> test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
> ORDER BY id1 ASC, id2 ASC LIMIT 10;
> QUERY PLAN
>
> ------------------------------------------------------------------------------
> Limit
> -> Merge Left Join
> Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
> -> Append
> -> Index Only Scan using fract_t0_id1_id2_idx on
> fract_t0 x_1
> -> Incremental Sort
> Sort Key: x_2.id1, x_2.id2
> Presorted Key: x_2.id1
> -> Index Scan using fract_t1_pkey on fract_t1 x_2
> -> Materialize
> -> Append
> -> Incremental Sort
> Sort Key: y_1.id1, y_1.id2
> Presorted Key: y_1.id1
> -> Index Scan using fract_t0_pkey on
> fract_t0 y_1
> Index Cond: (id1 = id1)
> Filter: (id2 = id2)
> -> Incremental Sort
> Sort Key: y_2.id1, y_2.id2
> Presorted Key: y_2.id1
> -> Index Scan using fract_t1_pkey on
> fract_t1 y_2
> Index Cond: (id1 = id1)
> Filter: (id2 = id2)
> (23 rows)
> [...]
> So that seems reasonable
Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't look like a valid plan to me. Sure all the nodes are arranged like I'd like them to be. But what are the id1/id2 bound we have in the index and filter conditions?
> [...]but there's a couple issues too:
>
> 1) Planning works (hence we can run explain), but execution fails
> because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
> some reason. I haven't looked into the details, but I'd guess we need to
> pass a different rel to create_incrementalsort_path, not childrel.
In case my above concern is valid, maybe the executor is just as confused as I am. Such conditions should generate VarSlots, no? If so, where should they come from?
Sadly I don't have time to debug that in depth today.
> 2) enable_partitionwisejoin=on seems to cause some confusion, because it
> results in picking a different plan with higher cost. But that's clearly
> not because of this patch.
Long version: What do you mean by confusion. The plan I get with the patch doesn't seem to confusing to me. Generally something like that is to be expected. enable_partitionwisejoin changes the way this planing works by rewriting the entire query effectively rule based. So we end up with a completely different query. I'd in particular expect slightly different startup costs.
So if we activate this we consider completely different plans, I struggle to come up with a meaningful example where there is any overlap at all. Thus it doesn't surprise me conceptually.
From practical experience I'd say: If they are about the same plan, the costs estimates work somewhat okish.
If we change the way we compose the nodes together, we sometimes end up with radical different costs for doing the same. While I suspect there are a lot of corner cases causing this, I've seen quite a few where this was due to the fact, that our planer often has insignificant statistics to know something and takes a guess. This has gotten way better of recent years, but it's in particular for non-trivial joins still a problem in practice.
> 3) I'm still a bit skeptical about the cost of this implementation - it
> builds the incrementalsort path, just to throw most of the paths away.
> It'd be much better to just calculate the cost, which should be much
> cheaper, and only do the heavy work for the one "best" path.
Maybe we should profile this to get a rough estimate, how much time we spend building these incremental paths. From a code perspective it's non trivial to me where the time is lost.
> 4) One thing I did not realize before is what pathkeys we even consider
> here. Imagine you have two tables:
>
> CREATE TABLE t1 (a int, b int, primary key (a));
> CREATE TABLE t2 (a int, b int, primary key (a));
>
> and query
>
> SELECT * FROM t1 JOIN t2 USING (a,b);
>
> It seems reasonable to also consider pathkeys (a,b) because that's make
> e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
> consider pathkeys that at least one child relation has, so in this case
> it's just (a). Which means we'll never consider incremental sort for
> this particular example.
>
> It's a bit strange, because it's enough to create index on (a,b) for
> just one of the relations, and it'll suddenly consider building
> incremental sort on both sides.
I don't find that surprising, because the single index *might* make the incremental sort cheaper for the join *without* considering any external sort order.
So we would be switching up the incremental sort and the mergejoin, in case we need to sort anyways. That would mean considering also the sort order, that might be relevant on the outside. Sounds like an interesting idea for a later patch.
> I don't plan to pursue this further at this point, so I pushed the first
> part because that's a fairly simple improvement over what we have now.
> But it seems like a fairly promising area for improvements.
I think 1) is pretty important, so we should sort that out sooner than later. Apart form that: :+1:
Thank you!
Regards
Arne
On 1/13/22 21:12, Arne Roland wrote: > Hi! > >> From: Tomas Vondra <tomas.vondra@enterprisedb.com> >> Subject: Re: PATCH: generate fractional cheapest paths in > generate_orderedappend_path >> >> test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) >> ORDER BY id1 ASC, id2 ASC LIMIT 10; >> QUERY PLAN >> >> > ------------------------------------------------------------------------------ >> Limit >> -> Merge Left Join >> Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2)) >> -> Append >> -> Index Only Scan using fract_t0_id1_id2_idx on >> fract_t0 x_1 >> -> Incremental Sort >> Sort Key: x_2.id1, x_2.id2 >> Presorted Key: x_2.id1 >> -> Index Scan using fract_t1_pkey on fract_t1 x_2 >> -> Materialize >> -> Append >> -> Incremental Sort >> Sort Key: y_1.id1, y_1.id2 >> Presorted Key: y_1.id1 >> -> Index Scan using fract_t0_pkey on >> fract_t0 y_1 >> Index Cond: (id1 = id1) >> Filter: (id2 = id2) >> -> Incremental Sort >> Sort Key: y_2.id1, y_2.id2 >> Presorted Key: y_2.id1 >> -> Index Scan using fract_t1_pkey on >> fract_t1 y_2 >> Index Cond: (id1 = id1) >> Filter: (id2 = id2) >> (23 rows) >> [...] >> So that seems reasonable > > Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't > look like a valid plan to me. Sure all the nodes are arranged like I'd > like them to be. But what are the id1/id2 bound we have in the index and > filter conditions? > I'm not claiming the plan is 100% correct, and you may have a point about the index condition / filter in the materialize branch. But the overall plan shape (with incremental sort nodes on both sides) seems reasonable to me. The materialize node is expected (incremental sort does not support rescans cheaply). Maybe that's not any cheaper than just doing merge join on the first column, and filter on the second. But we should be able to decide that based on cost, I think. >> [...]but there's a couple issues too: >> >> 1) Planning works (hence we can run explain), but execution fails >> because of segfault in CheckVarSlotCompatibility - it gets NULL slot for >> some reason. I haven't looked into the details, but I'd guess we need to >> pass a different rel to create_incrementalsort_path, not childrel. > > In case my above concern is valid, maybe the executor is just as > confused as I am. Such conditions should generate VarSlots, no? If so, > where should they come from? > Yeah, something like that. > Sadly I don't have time to debug that in depth today. > >> 2) enable_partitionwisejoin=on seems to cause some confusion, because it >> results in picking a different plan with higher cost. But that's clearly >> not because of this patch. > > Short version: I'd neither conceptually expect costs to be lower in any > case, nor would that be desirable, because our estimates aren't perfect. > > Long version: What do you mean by confusion. The plan I get with the > patch doesn't seem to confusing to me. Generally something like that is > to be expected. enable_partitionwisejoin changes the way this planing > works by rewriting the entire query effectively rule based. So we end up > with a completely different query. I'd in particular expect slightly > different startup costs. > So if we activate this we consider completely different plans, I > struggle to come up with a meaningful example where there is any overlap > at all. Thus it doesn't surprise me conceptually. > From practical experience I'd say: If they are about the same plan, the > costs estimates work somewhat okish. > If we change the way we compose the nodes together, we sometimes end up > with radical different costs for doing the same. While I suspect there > are a lot of corner cases causing this, I've seen quite a few where this > was due to the fact, that our planer often has insignificant statistics > to know something and takes a guess. This has gotten way better of > recent years, but it's in particular for non-trivial joins still a > problem in practice. > By confusion I meant that if you plan the query with partitionwise join enabled, you get a plan with cost X, and if you disable it you get a different plan with cost Y, where (Y < X). Which is somewhat unexpected, because that seems to simply reduce the set of plans we explore. But as you say, enable_partitionwise_join kinda "switches" between two planning modes. Not sure why we don't try building both paths and decide based on costs. >> 3) I'm still a bit skeptical about the cost of this implementation - it >> builds the incrementalsort path, just to throw most of the paths away. >> It'd be much better to just calculate the cost, which should be much >> cheaper, and only do the heavy work for the one "best" path. > > Maybe we should profile this to get a rough estimate, how much time we > spend building these incremental paths. From a code perspective it's non > trivial to me where the time is lost. > TBH I haven't really done any profiling, but I wouldn't be surprised if it got somewhat expensive for large number of child relations, especially if there are a couple indexes on each. We do something similar for nestloop (see initial_cost_nestloop). >> 4) One thing I did not realize before is what pathkeys we even consider >> here. Imagine you have two tables: >> >> CREATE TABLE t1 (a int, b int, primary key (a)); >> CREATE TABLE t2 (a int, b int, primary key (a)); >> >> and query >> >> SELECT * FROM t1 JOIN t2 USING (a,b); >> >> It seems reasonable to also consider pathkeys (a,b) because that's make >> e.g. mergejoin much cheaper, right? But no, we'll not do that - we only >> consider pathkeys that at least one child relation has, so in this case >> it's just (a). Which means we'll never consider incremental sort for >> this particular example. >> >> It's a bit strange, because it's enough to create index on (a,b) for >> just one of the relations, and it'll suddenly consider building >> incremental sort on both sides. > > I don't find that surprising, because the single index *might* make the > incremental sort cheaper for the join *without* considering any external > sort order. > So we would be switching up the incremental sort and the mergejoin, in > case we need to sort anyways. That would mean considering also the sort > order, that might be relevant on the outside. Sounds like an interesting > idea for a later patch. > I'm not sure it depends on the incremental sort. I suspect in some cases it might be faster to fully sort the merge join inputs, even if none of the input paths has suitable pathkeys. For example, if you do ... FROM t1 JOIN t2 USING (a,b) ... but there are only indexes on (a), maybe sorting on (a,b) would win e.g. if there's a lot of duplicate values in (a)? I was thinking about this variation on example from the committed patch: CREATE TABLE fract_t (id1 BIGINT, id2 BIGINT) PARTITION BY RANGE (id1); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('10'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('10') TO ('20'); CREATE INDEX ON fract_t(id1); INSERT INTO fract_t (id1, id2) SELECT i/100000, i FROM generate_series(0, 1999999) s(i); ANALYZE fract_t; -- not interested in nestloop/hashjoin paths for now set enable_hashjoin = off; set enable_nestloop = off; set max_parallel_workers_per_gather = 0; EXPLAIN (COSTS OFF) SELECT * FROM fract_t x JOIN fract_t y USING (id1, id2) order by id1; which is now planned like this: QUERY PLAN ----------------------------------------------------- Merge Join Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2)) -> Sort Sort Key: x.id1, x.id2 -> Append -> Seq Scan on fract_t0 x_1 -> Seq Scan on fract_t1 x_2 -> Materialize -> Sort Sort Key: y.id1, y.id2 -> Append -> Seq Scan on fract_t0 y_1 -> Seq Scan on fract_t1 y_2 (13 rows) But maybe a plan like this might be better: QUERY PLAN ----------------------------------------------------- Merge Join Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2)) -> Incremental Sort Sort Key: x.id1, x.id2 Presorted Key: x.id1 -> Append -> Index Scan on fract_t0 x_1 -> Index Scan on fract_t1 x_2 -> Materialize -> Incremental Sort Sort Key: y.id1, y.id2 Presorted Key: y.id1 -> Append -> Index Scan on fract_t0 y_1 -> Index Scan on fract_t1 y_2 or maybe even Incremental Sort + (Merge) Append on top. Which is what I was trying to achieve with the experimental patch. FWIW I did briefly look at the Merge Join + Incremental Sort plan too, and it seems we don't consider incremental sorts there either. AFAICS add_paths_to_joinrel justs call sort_inner_and_outer, which determines interesting pathkeys in select_outer_pathkeys_for_merge, and I guess that only considers pathkeys usable for full sort. In any case, we don't actually add sort paths - we construct sort plans by calling make_sort in create_mergejoin_plan. So there's not much chance for incremental sort at all. That's kinda unfortunate, I guess. It's one of the non-pathified places that we ignored while adding incremental sort. >> I don't plan to pursue this further at this point, so I pushed the first >> part because that's a fairly simple improvement over what we have now. >> But it seems like a fairly promising area for improvements. > > I think 1) is pretty important, so we should sort that out sooner than > later. Apart form that: :+1: > Thank you! > I agree it's worth investigating and experimenting with. We may end up realizing those plans are not worth it, but we won't know until we try. It may require replacing some of the hard-coded logic in createplan.c with constructing regular alternative paths. IIRC we even did some of this work in the incremental sort patch at some point, but then ripped that out to keep the patch smaller / simpler ... need to look at it again. Are you interested / willing to do some of this work? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2022-01-14 01:39:30 +0100, Tomas Vondra wrote: > Are you interested / willing to do some of this work? This patch hasn't moved in the last two months. I think it may be time to mark it as returned with feedback for now? It's also failing tests, and has done so for months: https://cirrus-ci.com/task/5308087077699584 https://api.cirrus-ci.com/v1/artifact/task/5308087077699584/log/src/test/regress/regression.diffs Greetings, Andres Freund
On 3/22/22 01:18, Andres Freund wrote: > Hi, > > On 2022-01-14 01:39:30 +0100, Tomas Vondra wrote: >> Are you interested / willing to do some of this work? > > This patch hasn't moved in the last two months. I think it may be time to > mark it as returned with feedback for now? > > It's also failing tests, and has done so for months: > > https://cirrus-ci.com/task/5308087077699584 > https://api.cirrus-ci.com/v1/artifact/task/5308087077699584/log/src/test/regress/regression.diffs > > Greetings, > Yeah. I think it's a useful improvement, but it needs much more work than is doable by the end of this CF. RwF seems about right. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company