Обсуждение: Re: Consider explicit incremental sort for Append and MergeAppend

Поиск
Список
Период
Сортировка

Re: Consider explicit incremental sort for Append and MergeAppend

От
Andrei Lepikhov
Дата:
On 12/5/2025 11:29, Richard Guo wrote:
> For ordered Append or MergeAppend, it seems that incremental sort is
> currently not considered when injecting an explicit sort into subpaths
> that are not sufficiently ordered.  For instance:
Thanks for doing this job.
I have reviewed your patch and want to put here some thoughts:
0. The patch looks simple enough to be safe. I passed through the code 
and found no issues except comments (see thought No.1). I will be okay 
if you commit it.
1. I'm not very happy with the fact that it strengthens the cost_append 
connection with create_append_plan. At least, there should be 
cross-reference comments to let developers know if they change something 
inside one of these functions.
2. IncrementalSort is not always more effective - it depends on the 
column's number of groups. In my experience, a non-cost-based decision 
one day meets the problematic case, and the people who stick with it are 
much more confused than in the case when planner decision connected to 
the costings - they trust the cost model or the cost model tuned by GUCs.
3. The functions label_incrementalsort_with_costsize and 
label_sort_with_costsize are not ideal architectural decisions. 
Attempting to improve sort / incremental sort cost functions, I am 
always stuck in the absence of some necessary data from the sorting path 
and RelOptInfo at this stage.

As an alternative, you may check the approach of [1], where we decide 
how to adjust a subpath to MergeAppend needs inside 
generate_orderedappend_paths using a cost-based approach.

Also, would you have a chance to look into the [1,2]? It seems like a 
further improvement, bringing a bit closer optimality of appended path 
choice to single-table scan choice.

[1] 
https://www.postgresql.org/message-id/flat/25d6a2cd161673d51373b7e07e6d9dd6%40postgrespro.ru
[2] 
https://www.postgresql.org/message-id/f0206ef2-6b5a-4d07-8770-cfa7cd30f685@gmail.com

-- 
regards, Andrei Lepikhov



Re: Consider explicit incremental sort for Append and MergeAppend

От
Robert Haas
Дата:
On Thu, May 15, 2025 at 9:03 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> 2. IncrementalSort is not always more effective - it depends on the
> column's number of groups. In my experience, a non-cost-based decision
> one day meets the problematic case, and the people who stick with it are
> much more confused than in the case when planner decision connected to
> the costings - they trust the cost model or the cost model tuned by GUCs.

I wonder if this could be fixed in nodeIncrementalSort.c. I think it's
a problem to rely on planner estimates because planner estimates of
the # of groups are quite unreliable. But at runtime, we can notice
whether the groups are small or large and adjust the algorithm
accordingly. In fact, it looks like we already have some logic for
that:

/*
 * Sorting many small groups with tuplesort is inefficient. In order to
 * cope with this problem we don't start a new group until the current one
 * contains at least DEFAULT_MIN_GROUP_SIZE tuples (unfortunately this also
 * means we can't assume small groups of tuples all have the same prefix keys.)
 * When we have a bound that's less than DEFAULT_MIN_GROUP_SIZE we start looking
 * for the new group as soon as we've met our bound to avoid fetching more
 * tuples than we absolutely have to fetch.
 */
#define DEFAULT_MIN_GROUP_SIZE 32

But maybe this logic needs to be further refined somehow. I can't
shake the feeling that it's going to be really hard to have every
place that uses incremental sort decide whether to use an incremental
sort or a regular sort -- we should try to get to a place where it's
safe to use an incremental sort when it's possible without worrying
that it might actually be slower.

Or maybe that's not possible for some reason, but it is not clear to
me what that reason might be.

--
Robert Haas
EDB: http://www.enterprisedb.com