Re: MergeAppend could consider sorting cheapest child path

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: MergeAppend could consider sorting cheapest child path
Дата
Msg-id CAPpHfdsHqMaJ=n_nj4D+yfcbdvd3gAaTwujL=Dj70git8P1u5w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: MergeAppend could consider sorting cheapest child path  (Alena Rybakina <a.rybakina@postgrespro.ru>)
Список pgsql-hackers
Hi, Alena!

On Fri, Oct 3, 2025 at 1:42 AM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> On 07.09.2025 14:26, Alexander Korotkov wrote:
> > On Fri, Sep 5, 2025 at 11:45 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> On 1/9/2025 22:26, Alexander Korotkov wrote:
> >>> On Thu, Jul 31, 2025 at 5:20 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>>> See this minor correction in the attachment. postgres_fdw tests are
> >>>> stable now.
> >>> I have another idea.  What if we allow MergeAppend paths only when at
> >>> least one subpath is preordered.  This trick also allow us to exclude
> >>> MergeAppend(Sort) dominating Sort(Append).  I see the regression tests
> >>> changes now have much less volume and looks more reasonable.  What do
> >>> you think?
> >> I believe a slight mistake has been made with the total_has_ordered /
> >> startup_has_ordered parameters, which has caused unnecessary test
> >> changes in inherit.out (See updated patch in the attachment). Although
> >> not the best test in general (it depends on the autovacuum), it
> >> highlighted the case where a startup-optimal strategy is necessary, even
> >> when a fractional-optimal path is available, which may lead to continue
> >> of the discussion [1].>
> >>> Also, do you think get_cheapest_fractional_path_for_pathkeys_ext() and
> >>> get_cheapest_path_for_pathkeys_ext() should consider incremental sort?
> >>>    The revised patch teaches them to do so.
> >> Following 55a780e9476 [2] it should be considered, of course.
> > Great, thank you for catching this.  The diff of costs is attached.  I
> > see the costs now are better or within the fuzz factor.
> >
> Hi! I looked at regression test changes but one of them confused me.
>
> Example where the plan shape changed:
>
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t2.b
> FROM (SELECT * FROM prt1 WHERE a < 450) t1
> LEFT JOIN (SELECT * FROM prt2 WHERE b > 250) t2
>    ON t1.a = t2.b
> WHERE t1.b = 0
> ORDER BY t1.a, t2.b;
>
> -- before
>                          QUERY PLAN
> -----------------------------------------------------------
>   Incremental Sort
>     Sort Key: prt1.a, prt2.b
>     Presorted Key: prt1.a
>     ->  Merge Left Join
>           Merge Cond: (prt1.a = prt2.b)
>           ->  Sort
>                 Sort Key: prt1.a
>                 ->  Append
>                       ->  Seq Scan on prt1_p1 prt1_1
>                             Filter: ((a < 450) AND (b = 0))
>                       ->  Seq Scan on prt1_p2 prt1_2
>                             Filter: ((a < 450) AND (b = 0))
>           ->  Sort
>                 Sort Key: prt2.b
>                 ->  Append
>                       ->  Seq Scan on prt2_p2 prt2_1
>                             Filter: (b > 250)
>                       ->  Seq Scan on prt2_p3 prt2_2
>                             Filter: (b > 250)
> (19 rows)
>
> -- now
>                             QUERY PLAN
> -----------------------------------------------------------------
>   Sort
>     Sort Key: prt1.a, prt2.b
>     ->  Merge Right Join
>           Merge Cond: (prt2.b = prt1.a)
>           ->  Append
>                 ->  Sort
>                       Sort Key: prt2_1.b
>                       ->  Seq Scan on prt2_p2 prt2_1
>                             Filter: (b > 250)
>                 ->  Sort
>                       Sort Key: prt2_2.b
>                       ->  Seq Scan on prt2_p3 prt2_2
>                             Filter: (b > 250)
>           ->  Materialize
>                 ->  Append
>                       ->  Sort
>                             Sort Key: prt1_1.a
>                             ->  Seq Scan on prt1_p1 prt1_1
>                                   Filter: ((a < 450) AND (b = 0))
>                       ->  Sort
>                             Sort Key: prt1_2.a
>                             ->  Seq Scan on prt1_p2 prt1_2
>                                   Filter: ((a < 450) AND (b = 0))
> (23 rows)
>
> Previously we had incremental sort on (t1.a, t2.b) with prt1.a already
> presorted; now we sort both t1.a and t2.b after a merge right join. It
> looks inefficiently or I missed something?
>
> Other tests looked fine for me.

Thank you for taking a look at this.

According to our cost model incremental sort has additional overhead
but gives huge wins on large row sets.  The row sets here are very
small.  You can get from [1] that the total cost became smaller.

Links.
1. https://www.postgresql.org/message-id/CAPpHfdsn_mPy1v6Gf8rmdkBDsDLU%2B%3DJ4M4sBzgaFv21cWruZFA%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase



В списке pgsql-hackers по дате отправления: