Обсуждение: Even when the data is already ordered, MergeAppend still adds a Sort node
Hi,
Recently, I found while using `union all order by limit` that `MergeAppend`
adds a `Sort` node even when child node data is already ordered, leading to
low execution efficiency.
The issue can be reproduced with the following case (on the latest master
branch). In the execution plan below, although `t2` is ordered by `a, b` via
`t_a_b_idx`, a `Sort` node is still added, causing the outer `limit` to be
ineffective. Most of the time is spent on the `Sort` node.
```sql
create table t(a int, b int);
insert into t select 1, i from generate_series(1, 10000)i;
insert into t select i, i from generate_series(10000, 20000)i;
create index on t(a, b);
analyze t;
set enable_bitmapscan to off;
set enable_seqscan to off;
explain select a, b from (
(select a, b from t t1 where a > 19000 order by a, b)
union all
(select a, b from t t2 where a = 1 and b > 1 order by a, b)
) t order by a, b limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=366.56..366.57 rows=1 width=8)
-> Merge Append (cost=366.56..531.05 rows=10999 width=8)
Sort Key: t1.a, t1.b
-> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8)
Index Cond: (a > 19000)
-> Sort (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t2.a, t2.b
-> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(9 rows)
```
I briefly analyzed the cause of this issue: in the `standard_qp_callback`
function, while setting `sort_pathkeys`, the `a` column is not added to
`sort_pathkeys` because its `EquivalenceClass` contains `Const`. Hence, `a`
is absent in `PlannerInfo->query_pathkeys`. The purpose of this might be to
reduce the number of Sort columns.
Subsequently, when `build_index_paths` generates `IndexPath`, the
`truncate_useless_pathkeys` function removes the `a` column from
`Path->pathkeys`.
Thus, this issue can be minimally reproduced with the following SQL:
```sql
explain select * from (select * from t where a > 19000 order by a, b) order by a, b limit 1;
QUERY PLAN
----------------------------------------------------------------------------------
Limit (cost=0.29..0.32 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..29.79 rows=1000 width=8)
Index Cond: (a > 19000)
(3 rows)
explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit (cost=366.26..366.27 rows=1 width=8)
-> Sort (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t.a, t.b
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(5 rows)
```
Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.
Furthermore, is there a more efficient way to write this, to avoid the
`Sort` node mentioned above?
Best Regards,
Fei Changhong
Fei Changhong
On Jul 18, 2025, at 22:37, feichanghong <feichanghong@qq.com> wrote:Should we retain the complete `pathkeys` in `Path->pathkeys` for use by theupper layers of the subquery, rather than just keeping the portion trimmed by`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.
Currently, a rough solution I can think of is: if the current query is not the
outermost Query, then pathkey_is_redundant should ignore the "EC contains a
constant" check. The specific fix is as follows:
```
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 8b04d40d36d..9fd1f3f8ff7 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -31,7 +31,7 @@
/* Consider reordering of GROUP BY keys? */
bool enable_group_by_reordering = true;
-static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
+static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys, bool check_ec);
static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
RelOptInfo *partrel,
int partkeycol);
@@ -104,7 +104,7 @@ make_canonical_pathkey(PlannerInfo *root,
* returns the updated 'target' list.
*/
List *
-append_pathkeys(List *target, List *source)
+append_pathkeys(List *target, List *source, bool check_ec)
{
ListCell *lc;
@@ -114,7 +114,7 @@ append_pathkeys(List *target, List *source)
{
PathKey *pk = lfirst_node(PathKey, lc);
- if (!pathkey_is_redundant(pk, target))
+ if (!pathkey_is_redundant(pk, target, check_ec))
target = lappend(target, pk);
}
return target;
@@ -156,13 +156,13 @@ append_pathkeys(List *target, List *source)
* pointer comparison is enough to decide whether canonical ECs are the same.
*/
static bool
-pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
+pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys, bool check_ec)
{
EquivalenceClass *new_ec = new_pathkey->pk_eclass;
ListCell *lc;
/* Check for EC containing a constant --- unconditionally redundant */
- if (EC_MUST_BE_REDUNDANT(new_ec))
+ if (check_ec && EC_MUST_BE_REDUNDANT(new_ec))
return true;
/* If same EC already used in list, then redundant */
@@ -798,7 +798,7 @@ build_index_pathkeys(PlannerInfo *root,
* We found the sort key in an EquivalenceClass, so it's relevant
* for this query. Add it to list, unless it's redundant.
*/
- if (!pathkey_is_redundant(cpathkey, retval))
+ if (!pathkey_is_redundant(cpathkey, retval, root->query_level == 1))
retval = lappend(retval, cpathkey);
}
else
@@ -958,7 +958,7 @@ build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
* We found the sort key in an EquivalenceClass, so it's relevant
* for this query. Add it to list, unless it's redundant.
*/
- if (!pathkey_is_redundant(cpathkey, retval))
+ if (!pathkey_is_redundant(cpathkey, retval, root->query_level == 1))
retval = lappend(retval, cpathkey);
}
else
@@ -1229,7 +1229,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
* Eliminate redundant ordering info; could happen if outer query
* equivalences subquery keys...
*/
- if (!pathkey_is_redundant(best_pathkey, retval))
+ if (!pathkey_is_redundant(best_pathkey, retval, root->query_level == 1))
{
retval = lappend(retval, best_pathkey);
retvallen++;
@@ -1428,7 +1428,7 @@ make_pathkeys_for_sortclauses_extended(PlannerInfo *root,
}
/* Canonical form eliminates redundant ordering keys */
- if (!pathkey_is_redundant(pathkey, pathkeys))
+ if (!pathkey_is_redundant(pathkey, pathkeys, root->query_level == 1))
pathkeys = lappend(pathkeys, pathkey);
else if (remove_redundant)
*sortclauses = foreach_delete_current(*sortclauses, l);
@@ -1823,7 +1823,7 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
COMPARE_LT,
false);
/* can't be redundant because no duplicate ECs */
- Assert(!pathkey_is_redundant(pathkey, pathkeys));
+ Assert(!pathkey_is_redundant(pathkey, pathkeys, root->query_level == 1));
pathkeys = lappend(pathkeys, pathkey);
}
@@ -1925,7 +1925,7 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
* reason, it certainly wouldn't match any available sort order for
* the input relation.
*/
- if (!pathkey_is_redundant(pathkey, pathkeys))
+ if (!pathkey_is_redundant(pathkey, pathkeys, root->query_level == 1))
pathkeys = lappend(pathkeys, pathkey);
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..1dcc3ae29b9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3312,7 +3312,8 @@ adjust_group_pathkeys_for_groupagg(PlannerInfo *root)
/* include the GROUP BY pathkeys, if they exist */
if (grouppathkeys != NIL)
currpathkeys = append_pathkeys(list_copy(grouppathkeys),
- currpathkeys);
+ currpathkeys,
+ root->query_level == 1);
/* record that we found pathkeys for this aggregate */
aggindexes = bms_add_member(aggindexes, i);
@@ -3324,7 +3325,8 @@ adjust_group_pathkeys_for_groupagg(PlannerInfo *root)
/* include the GROUP BY pathkeys, if they exist */
if (grouppathkeys != NIL)
pathkeys = append_pathkeys(list_copy(grouppathkeys),
- pathkeys);
+ pathkeys,
+ root->query_level == 1);
/* are 'pathkeys' compatible or better than 'currpathkeys'? */
switch (compare_pathkeys(currpathkeys, pathkeys))
@@ -6275,7 +6277,7 @@ make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
/* Okay, make the combined pathkeys */
if (window_pathkeys != NIL)
- window_pathkeys = append_pathkeys(window_pathkeys, orderby_pathkeys);
+ window_pathkeys = append_pathkeys(window_pathkeys, orderby_pathkeys, root->query_level == 1);
else
window_pathkeys = orderby_pathkeys;
}
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..7766ae07cab 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -271,7 +271,7 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,
RelOptInfo *rel,
List *pathkeys);
extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
-extern List *append_pathkeys(List *target, List *source);
+extern List *append_pathkeys(List *target, List *source, bool check_ec);
extern PathKey *make_canonical_pathkey(PlannerInfo *root,
EquivalenceClass *eclass, Oid opfamily,
CompareType cmptype, bool nulls_first);
```
Looking forward to hearing everyone's suggestions.
Best Regards,
Fei Changhong
Fei Changhong
Hi,
On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:
On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:
explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit (cost=366.26..366.27 rows=1 width=8)
-> Sort (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t.a, t.b
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(5 rows)
```
Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.
The subquery has a qualifier
a = 1
that forms an Equivalence Class (EC) whose ec_member
contains a constant. As a result,
subroot->sort_pathkeys
doesn't need to include column a
.However, in the outer query, there are no such qualifiers to form a similar EC, and the
ORDER BY a, b
clause means root->sort_pathkeys
requires both columns a
and b
. When
convert_subquery_pathkeys
is called, the subpath lacks the pathkeys for column a
.Furthermore, is there a more efficient way to write this, to avoid the
`Sort` node mentioned above?
A simple solution is to add an EC using a qual:
EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.29..0.32 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(3 rows)
--
Zhang Mingli
HashData
On Jul 19, 2025, at 13:15, Zhang Mingli <zmlpostgres@gmail.com> wrote:
Hi,
On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:
On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:
explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit (cost=366.26..366.27 rows=1 width=8)
-> Sort (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t.a, t.b
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(5 rows)
```
Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.
The subquery has a qualifier
a = 1
that forms an Equivalence Class (EC) whose ec_member
contains a constant. As a result,
subroot->sort_pathkeys
doesn't need to include column a
.However, in the outer query, there are no such qualifiers to form a similar EC, and the
ORDER BY a, b
clause means root->sort_pathkeys
requires both columns a
and b
. When
convert_subquery_pathkeys
is called, the subpath lacks the pathkeys for column a
.Furthermore, is there a more efficient way to write this, to avoid the
`Sort` node mentioned above?
Yes, your understanding is basically consistent with mine.
A simple solution is to add an EC using a qual:
EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.29..0.32 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(3 rows)
Thank you for your suggestion, this method can address simple subquery
scenarios. However, my situation involves a union all, so it's not possible to
add the corresponding equality qualifier at the top level. The SQL is as
follows:
```sql
explain select a, b from (
(select a, b from t t1 where a > 19000 order by a, b)
union all
(select a, b from t t2 where a = 1 and b > 1 order by a, b)
) t order by a, b limit 1;
```
Currently, I have not found a better way to rewrite this, except by optimizing
this scenario from the pg kernel side.
Best Regards,
Fei Changhong
Fei Changhong
feichanghong <feichanghong@qq.com> writes: > Currently, I have not found a better way to rewrite this, except by optimizing > this scenario from the pg kernel side. If you're willing to modify your query, you could fake it out by spelling the subquery's "a = 1" condition in a way that won't produce an EquivalenceClass. For example, regression=# explain analyze select a, b from ( (select a, b from t t1 where a > 19000 order by a, b) union all (select a, b from t t2 where a >= 1 and a <= 1 and b > 1 order by a, b) ) t order by a, b limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.58..0.63 rows=1 width=8) (actual time=0.070..0.070 rows=1.00 loops=1) Buffers: shared hit=8 read=1 -> Merge Append (cost=0.58..481.10 rows=11000 width=8) (actual time=0.069..0.069 rows=1.00 loops=1) Sort Key: t1.a, t1.b Buffers: shared hit=8 read=1 -> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8) (actual time=0.027..0.027 rows=1.00loops=1) Index Cond: (a > 19000) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=2 read=1 -> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..341.30 rows=10000 width=8) (actual time=0.041..0.041 rows=1.00loops=1) Index Cond: ((a >= 1) AND (a <= 1) AND (b > 1)) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=6 Planning: Buffers: shared hit=6 Planning Time: 0.174 ms Execution Time: 0.089 ms (19 rows) I'd be the first to agree that that's a hack not a nice solution. But I think getting to something that's not a hack is going to involve a lot more work than this edge case seems worth. We're not likely to accept a patch that pessimizes planning within subqueries on the small chance that that will result in a path whose apparent sort order matches the needs of the outer query better. Maybe something could be done inside convert_subquery_pathkeys, but I suspect we don't really have enough information at that point to decide what to do. regards, tom lane
On Jul 21, 2025, at 00:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:feichanghong <feichanghong@qq.com> writes:Currently, I have not found a better way to rewrite this, except by optimizing
this scenario from the pg kernel side.
If you're willing to modify your query, you could fake it out by
spelling the subquery's "a = 1" condition in a way that won't produce an
EquivalenceClass. For example,
regression=# explain analyze select a, b from (
(select a, b from t t1 where a > 19000 order by a, b)
union all
(select a, b from t t2 where a >= 1 and a <= 1 and b > 1 order by a, b)
) t order by a, b limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.58..0.63 rows=1 width=8) (actual time=0.070..0.070 rows=1.00 loops=1)
Buffers: shared hit=8 read=1
-> Merge Append (cost=0.58..481.10 rows=11000 width=8) (actual time=0.069..0.069 rows=1.00 loops=1)
Sort Key: t1.a, t1.b
Buffers: shared hit=8 read=1
-> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8) (actual time=0.027..0.027 rows=1.00 loops=1)
Index Cond: (a > 19000)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=2 read=1
-> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..341.30 rows=10000 width=8) (actual time=0.041..0.041 rows=1.00 loops=1)
Index Cond: ((a >= 1) AND (a <= 1) AND (b > 1))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=6
Planning:
Buffers: shared hit=6
Planning Time: 0.174 ms
Execution Time: 0.089 ms
(19 rows)
Thank you very much for your suggestion, this method is effective for my
scenario. I will modify my SQL accordingly.
I'd be the first to agree that that's a hack not a nice solution.
But I think getting to something that's not a hack is going to
involve a lot more work than this edge case seems worth. We're
not likely to accept a patch that pessimizes planning within
subqueries on the small chance that that will result in a path
whose apparent sort order matches the needs of the outer query
better. Maybe something could be done inside
convert_subquery_pathkeys, but I suspect we don't really have
enough information at that point to decide what to do.
Yes, if we retain the EC_MUST_BE_REDUNDANT pathkey in the subquery, it may lead
to a less optimal plan. For example, consider the following case: the subquery
is sorted by "a,b", and the top-level query is sorted by "b". In this
situation, the subquery's pathkey "b" matches the top-level query. However, if
we retain the subquery's pathkey as "a,b", it does not match the top-level
query, resulting in the need for an additional sort operation.
```sql
-- subquery pathkey is "b"
postgres=# explain select * from (select * from t where a = 1 and b > 1 order by a, b) t order by b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.29..0.33 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(3 rows)
-- subquery pathkey is "a,b"
postgres=# explain select * from (select * from t where a = 1 and b > 1 order by a, b) t order by b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=1155.56..1155.56 rows=1 width=8)
-> Sort (cost=1155.56..1180.56 rows=9999 width=8)
Sort Key: t.b
-> Sort (cost=980.58..1005.58 rows=9999 width=8)
Sort Key: t.a, t.b
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(7 rows)
```
A possible solution might be: we retain the EC_MUST_BE_REDUNDANT pathkey within
the subquery, but when executing create_sort_plan, pathkeys_contained_in, and
convert_subquery_pathkeys, we need to take into account pathkeys with
ec_has_const. Naturally, the actual situation could be much more complex.
Best Regards,
Fei Changhong
Fei Changhong