Обсуждение: Even when the data is already ordered, MergeAppend still adds a Sort node

Поиск

Список

Период

Сортировка

Even when the data is already ordered, MergeAppend still adds a Sort node

От

feichanghong

Дата:

18 июля 2025 г., 17:37:56

Hi,

Recently, I found while using `union all order by limit` that `MergeAppend`

adds a `Sort` node even when child node data is already ordered, leading to

low execution efficiency.

The issue can be reproduced with the following case (on the latest master

branch). In the execution plan below, although `t2` is ordered by `a, b` via

`t_a_b_idx`, a `Sort` node is still added, causing the outer `limit` to be

ineffective. Most of the time is spent on the `Sort` node.

```sql

create table t(a int, b int);

insert into t select 1, i from generate_series(1, 10000)i;

insert into t select i, i from generate_series(10000, 20000)i;

create index on t(a, b);

analyze t;

set enable_bitmapscan to off;

set enable_seqscan to off;

explain select a, b from (

(select a, b from t t1 where a > 19000 order by a, b)

union all

(select a, b from t t2 where a = 1 and b > 1 order by a, b)

) t order by a, b limit 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------

Limit (cost=366.56..366.57 rows=1 width=8)

-> Merge Append (cost=366.56..531.05 rows=10999 width=8)

Sort Key: t1.a, t1.b

-> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8)

Index Cond: (a > 19000)

-> Sort (cost=366.26..391.26 rows=9999 width=8)

Sort Key: t2.a, t2.b

-> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..316.27 rows=9999 width=8)

Index Cond: ((a = 1) AND (b > 1))

(9 rows)

```

I briefly analyzed the cause of this issue: in the `standard_qp_callback`

function, while setting `sort_pathkeys`, the `a` column is not added to

`sort_pathkeys` because its `EquivalenceClass` contains `Const`. Hence, `a`

is absent in `PlannerInfo->query_pathkeys`. The purpose of this might be to

reduce the number of Sort columns.

Subsequently, when `build_index_paths` generates `IndexPath`, the

`truncate_useless_pathkeys` function removes the `a` column from

`Path->pathkeys`.

Thus, this issue can be minimally reproduced with the following SQL:

```sql

explain select * from (select * from t where a > 19000 order by a, b) order by a, b limit 1;

QUERY PLAN

----------------------------------------------------------------------------------

Limit (cost=0.29..0.32 rows=1 width=8)

-> Index Only Scan using t_a_b_idx on t (cost=0.29..29.79 rows=1000 width=8)

Index Cond: (a > 19000)

(3 rows)

explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;

QUERY PLAN

-----------------------------------------------------------------------------------------

Limit (cost=366.26..366.27 rows=1 width=8)

-> Sort (cost=366.26..391.26 rows=9999 width=8)

Sort Key: t.a, t.b

-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)

Index Cond: ((a = 1) AND (b > 1))

(5 rows)

```

Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the

upper layers of the subquery, rather than just keeping the portion trimmed by

`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

Furthermore, is there a more efficient way to write this, to avoid the

`Sort` node mentioned above?

Best Regards,
Fei Changhong

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

От

feichanghong

Дата:

19 июля 2025 г., 06:36:14

On Jul 18, 2025, at 22:37, feichanghong <feichanghong@qq.com> wrote:

Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

Currently, a rough solution I can think of is: if the current query is not the

outermost Query, then pathkey_is_redundant should ignore the "EC contains a

constant" check. The specific fix is as follows:

```

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c

index 8b04d40d36d..9fd1f3f8ff7 100644

--- a/src/backend/optimizer/path/pathkeys.c

+++ b/src/backend/optimizer/path/pathkeys.c

@@ -31,7 +31,7 @@

/* Consider reordering of GROUP BY keys? */

bool enable_group_by_reordering = true;

-static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);

+static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys, bool check_ec);

static bool matches_boolean_partition_clause(RestrictInfo *rinfo,

RelOptInfo *partrel,

int partkeycol);

@@ -104,7 +104,7 @@ make_canonical_pathkey(PlannerInfo *root,

* returns the updated 'target' list.

List *

-append_pathkeys(List *target, List *source)

+append_pathkeys(List *target, List *source, bool check_ec)

{

ListCell *lc;

@@ -114,7 +114,7 @@ append_pathkeys(List *target, List *source)

{

PathKey *pk = lfirst_node(PathKey, lc);

- if (!pathkey_is_redundant(pk, target))

+ if (!pathkey_is_redundant(pk, target, check_ec))

target = lappend(target, pk);

}

return target;

@@ -156,13 +156,13 @@ append_pathkeys(List *target, List *source)

* pointer comparison is enough to decide whether canonical ECs are the same.

static bool

-pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)

+pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys, bool check_ec)

{

EquivalenceClass *new_ec = new_pathkey->pk_eclass;

ListCell *lc;

/* Check for EC containing a constant --- unconditionally redundant */

- if (EC_MUST_BE_REDUNDANT(new_ec))

+ if (check_ec && EC_MUST_BE_REDUNDANT(new_ec))

return true;

/* If same EC already used in list, then redundant */

@@ -798,7 +798,7 @@ build_index_pathkeys(PlannerInfo *root,

* We found the sort key in an EquivalenceClass, so it's relevant

* for this query. Add it to list, unless it's redundant.

- if (!pathkey_is_redundant(cpathkey, retval))

+ if (!pathkey_is_redundant(cpathkey, retval, root->query_level == 1))

retval = lappend(retval, cpathkey);

}

else

@@ -958,7 +958,7 @@ build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,

* We found the sort key in an EquivalenceClass, so it's relevant

* for this query. Add it to list, unless it's redundant.

- if (!pathkey_is_redundant(cpathkey, retval))

+ if (!pathkey_is_redundant(cpathkey, retval, root->query_level == 1))

retval = lappend(retval, cpathkey);

}

else

@@ -1229,7 +1229,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,

* Eliminate redundant ordering info; could happen if outer query

* equivalences subquery keys...

- if (!pathkey_is_redundant(best_pathkey, retval))

+ if (!pathkey_is_redundant(best_pathkey, retval, root->query_level == 1))

{

retval = lappend(retval, best_pathkey);

retvallen++;

@@ -1428,7 +1428,7 @@ make_pathkeys_for_sortclauses_extended(PlannerInfo *root,

}

/* Canonical form eliminates redundant ordering keys */

- if (!pathkey_is_redundant(pathkey, pathkeys))

+ if (!pathkey_is_redundant(pathkey, pathkeys, root->query_level == 1))

pathkeys = lappend(pathkeys, pathkey);

else if (remove_redundant)

*sortclauses = foreach_delete_current(*sortclauses, l);

@@ -1823,7 +1823,7 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,

COMPARE_LT,

false);

/* can't be redundant because no duplicate ECs */

- Assert(!pathkey_is_redundant(pathkey, pathkeys));

+ Assert(!pathkey_is_redundant(pathkey, pathkeys, root->query_level == 1));

pathkeys = lappend(pathkeys, pathkey);

}

@@ -1925,7 +1925,7 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,

* reason, it certainly wouldn't match any available sort order for

* the input relation.

- if (!pathkey_is_redundant(pathkey, pathkeys))

+ if (!pathkey_is_redundant(pathkey, pathkeys, root->query_level == 1))

pathkeys = lappend(pathkeys, pathkey);

}

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c

index 549aedcfa99..1dcc3ae29b9 100644

--- a/src/backend/optimizer/plan/planner.c

+++ b/src/backend/optimizer/plan/planner.c

@@ -3312,7 +3312,8 @@ adjust_group_pathkeys_for_groupagg(PlannerInfo *root)

/* include the GROUP BY pathkeys, if they exist */

if (grouppathkeys != NIL)

currpathkeys = append_pathkeys(list_copy(grouppathkeys),

- currpathkeys);

+ currpathkeys,

+ root->query_level == 1);

/* record that we found pathkeys for this aggregate */

aggindexes = bms_add_member(aggindexes, i);

@@ -3324,7 +3325,8 @@ adjust_group_pathkeys_for_groupagg(PlannerInfo *root)

/* include the GROUP BY pathkeys, if they exist */

if (grouppathkeys != NIL)

pathkeys = append_pathkeys(list_copy(grouppathkeys),

- pathkeys);

+ pathkeys,

+ root->query_level == 1);

/* are 'pathkeys' compatible or better than 'currpathkeys'? */

switch (compare_pathkeys(currpathkeys, pathkeys))

@@ -6275,7 +6277,7 @@ make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,

/* Okay, make the combined pathkeys */

if (window_pathkeys != NIL)

- window_pathkeys = append_pathkeys(window_pathkeys, orderby_pathkeys);

+ window_pathkeys = append_pathkeys(window_pathkeys, orderby_pathkeys, root->query_level == 1);

else

window_pathkeys = orderby_pathkeys;

}

diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h

index 8410531f2d6..7766ae07cab 100644

--- a/src/include/optimizer/paths.h

+++ b/src/include/optimizer/paths.h

@@ -271,7 +271,7 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,

RelOptInfo *rel,

List *pathkeys);

extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);

-extern List *append_pathkeys(List *target, List *source);

+extern List *append_pathkeys(List *target, List *source, bool check_ec);

extern PathKey *make_canonical_pathkey(PlannerInfo *root,

EquivalenceClass *eclass, Oid opfamily,

CompareType cmptype, bool nulls_first);

```

Looking forward to hearing everyone's suggestions.

Best Regards,
Fei Changhong

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

От

Zhang Mingli

Дата:

19 июля 2025 г., 08:15:04

Hi,

On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:

explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit  (cost=366.26..366.27 rows=1 width=8)
->  Sort  (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t.a, t.b
->  Index Only Scan using t_a_b_idx on t  (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(5 rows)
```

Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

The subquery has a qualifier a = 1 that forms an Equivalence Class (EC) whose ec_member contains a constant.
As a result, subroot->sort_pathkeys doesn't need to include column a.
However, in the outer query, there are no such qualifiers to form a similar EC, and the ORDER BY a, b clause means root->sort_pathkeys requires both columns a and b.
When convert_subquery_pathkeys is called, the subpath lacks the pathkeys for column a.

Furthermore, is there a more efficient way to write this, to avoid the
`Sort` node mentioned above?

A simple solution is to add an EC using a qual:

EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.29..0.32 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(3 rows)

--
Zhang Mingli
HashData

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

От

feichanghong

Дата:

20 июля 2025 г., 11:49:16

On Jul 19, 2025, at 13:15, Zhang Mingli <zmlpostgres@gmail.com> wrote:

Hi,

On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:

explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit  (cost=366.26..366.27 rows=1 width=8)
->  Sort  (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t.a, t.b
->  Index Only Scan using t_a_b_idx on t  (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(5 rows)
```

Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

Furthermore, is there a more efficient way to write this, to avoid the
`Sort` node mentioned above?

Yes, your understanding is basically consistent with mine.

A simple solution is to add an EC using a qual:

EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.29..0.32 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(3 rows)

Thank you for your suggestion, this method can address simple subquery

scenarios. However, my situation involves a union all, so it's not possible to

add the corresponding equality qualifier at the top level. The SQL is as

follows:

```sql

explain select a, b from (

(select a, b from t t1 where a > 19000 order by a, b)

union all

(select a, b from t t2 where a = 1 and b > 1 order by a, b)

) t order by a, b limit 1;

```

Currently, I have not found a better way to rewrite this, except by optimizing

this scenario from the pg kernel side.

Best Regards,
Fei Changhong

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

От

Tom Lane

Дата:

20 июля 2025 г., 19:03:31

feichanghong <feichanghong@qq.com> writes:
> Currently, I have not found a better way to rewrite this, except by optimizing
> this scenario from the pg kernel side.

If you're willing to modify your query, you could fake it out by
spelling the subquery's "a = 1" condition in a way that won't produce an
EquivalenceClass.  For example,

regression=# explain analyze select a, b from (
    (select a, b from t t1 where a > 19000 order by a, b)
    union all
    (select a, b from t t2 where a >= 1 and a <= 1 and b > 1 order by a, b)
) t order by a, b limit 1;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.58..0.63 rows=1 width=8) (actual time=0.070..0.070 rows=1.00 loops=1)
   Buffers: shared hit=8 read=1
   ->  Merge Append  (cost=0.58..481.10 rows=11000 width=8) (actual time=0.069..0.069 rows=1.00 loops=1)
         Sort Key: t1.a, t1.b
         Buffers: shared hit=8 read=1
         ->  Index Only Scan using t_a_b_idx on t t1  (cost=0.29..29.79 rows=1000 width=8) (actual time=0.027..0.027
rows=1.00loops=1) 
               Index Cond: (a > 19000)
               Heap Fetches: 0
               Index Searches: 1
               Buffers: shared hit=2 read=1
         ->  Index Only Scan using t_a_b_idx on t t2  (cost=0.29..341.30 rows=10000 width=8) (actual time=0.041..0.041
rows=1.00loops=1) 
               Index Cond: ((a >= 1) AND (a <= 1) AND (b > 1))
               Heap Fetches: 0
               Index Searches: 1
               Buffers: shared hit=6
 Planning:
   Buffers: shared hit=6
 Planning Time: 0.174 ms
 Execution Time: 0.089 ms
(19 rows)

I'd be the first to agree that that's a hack not a nice solution.
But I think getting to something that's not a hack is going to
involve a lot more work than this edge case seems worth.  We're
not likely to accept a patch that pessimizes planning within
subqueries on the small chance that that will result in a path
whose apparent sort order matches the needs of the outer query
better.  Maybe something could be done inside
convert_subquery_pathkeys, but I suspect we don't really have
enough information at that point to decide what to do.

            regards, tom lane

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

От

feichanghong

Дата:

21 июля 2025 г., 04:58:09

On Jul 21, 2025, at 00:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:

feichanghong <feichanghong@qq.com> writes:
Currently, I have not found a better way to rewrite this, except by optimizing
this scenario from the pg kernel side.

If you're willing to modify your query, you could fake it out by
spelling the subquery's "a = 1" condition in a way that won't produce an
EquivalenceClass. For example,

regression=# explain analyze select a, b from (
   (select a, b from t t1 where a > 19000 order by a, b)
   union all
   (select a, b from t t2 where a >= 1 and a <= 1 and b > 1 order by a, b)
) t order by a, b limit 1;
                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.58..0.63 rows=1 width=8) (actual time=0.070..0.070 rows=1.00 loops=1)
  Buffers: shared hit=8 read=1
  -> Merge Append (cost=0.58..481.10 rows=11000 width=8) (actual time=0.069..0.069 rows=1.00 loops=1)
        Sort Key: t1.a, t1.b
        Buffers: shared hit=8 read=1
        -> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8) (actual time=0.027..0.027 rows=1.00 loops=1)
              Index Cond: (a > 19000)
              Heap Fetches: 0
              Index Searches: 1
              Buffers: shared hit=2 read=1
        -> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..341.30 rows=10000 width=8) (actual time=0.041..0.041 rows=1.00 loops=1)
              Index Cond: ((a >= 1) AND (a <= 1) AND (b > 1))
              Heap Fetches: 0
              Index Searches: 1
              Buffers: shared hit=6
Planning:
  Buffers: shared hit=6
Planning Time: 0.174 ms
Execution Time: 0.089 ms
(19 rows)

Thank you very much for your suggestion, this method is effective for my

scenario. I will modify my SQL accordingly.

I'd be the first to agree that that's a hack not a nice solution.
But I think getting to something that's not a hack is going to
involve a lot more work than this edge case seems worth. We're
not likely to accept a patch that pessimizes planning within
subqueries on the small chance that that will result in a path
whose apparent sort order matches the needs of the outer query
better. Maybe something could be done inside
convert_subquery_pathkeys, but I suspect we don't really have
enough information at that point to decide what to do.

Yes, if we retain the EC_MUST_BE_REDUNDANT pathkey in the subquery, it may lead

to a less optimal plan. For example, consider the following case: the subquery

is sorted by "a,b", and the top-level query is sorted by "b". In this

situation, the subquery's pathkey "b" matches the top-level query. However, if

we retain the subquery's pathkey as "a,b", it does not match the top-level

query, resulting in the need for an additional sort operation.

```sql

-- subquery pathkey is "b"

postgres=# explain select * from (select * from t where a = 1 and b > 1 order by a, b) t order by b limit 1;

QUERY PLAN

-----------------------------------------------------------------------------------

Limit (cost=0.29..0.33 rows=1 width=8)

-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)

Index Cond: ((a = 1) AND (b > 1))

(3 rows)

-- subquery pathkey is "a,b"

postgres=# explain select * from (select * from t where a = 1 and b > 1 order by a, b) t order by b limit 1;

QUERY PLAN

-----------------------------------------------------------------------------------------------

Limit (cost=1155.56..1155.56 rows=1 width=8)

-> Sort (cost=1155.56..1180.56 rows=9999 width=8)

Sort Key: t.b

-> Sort (cost=980.58..1005.58 rows=9999 width=8)

Sort Key: t.a, t.b

-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)

Index Cond: ((a = 1) AND (b > 1))

(7 rows)

```

A possible solution might be: we retain the EC_MUST_BE_REDUNDANT pathkey within

the subquery, but when executing create_sort_plan, pathkeys_contained_in, and

convert_subquery_pathkeys, we need to take into account pathkeys with

ec_has_const. Naturally, the actual situation could be much more complex.

Best Regards,
Fei Changhong

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Even when the data is already ordered, MergeAppend still adds a Sort node

Even when the data is already ordered, MergeAppend still adds a Sort node

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

Re: Even when the data is already ordered, MergeAppend still adds a Sort node

Re: Even when the data is already ordered, MergeAppend still adds a Sort node