Обсуждение: Incorrect calculation of path fraction value in MergeAppend

Поиск
Список
Период
Сортировка

Incorrect calculation of path fraction value in MergeAppend

От
Andrei Lepikhov
Дата:
Hi,

Commit 6b94e7a has added a "fractional" strategy to the merge join path 
search scope. But as I see it incorrectly calculates tuple_fraction 
value. Let's see one of the regression tests' queries:

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
   USING (id) ORDER BY x.id DESC LIMIT 10;

It uses NestedLoop with parameterised inner scan as a subplan of 
MergeAppend. Overall cost is:
Limit  (cost=1.11..4.86 rows=10 width=16)

Let's reduce the LIMIT a little:

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
   USING (id) ORDER BY x.id DESC LIMIT 2;

There, the query plan uses plain NestLoop node and overall cost 
significantly increases:
Limit  (cost=0.56..30.70 rows=2 width=16)

The origin problem lies in the following code line:
    path_fraction = (1.0 / root->tuple_fraction);

Building startup, total and fractional MergeAppends, the optimiser puts 
off the 'fractional' way because it has built for a 50% limit (uses 
MergeJoin with high startup cost) and chooses a startup-optimal path 
(built upon plain NestLoops) that is better than the total one.

Instead, the path_fraction value must be corrected when 
root->tuple_fraction contains the absolute value of tuples. See the 
patch in the attachment.

I report this issue into the bugs mailing list because it might cause 
some 'enigmatic' performance slowdowns I have heard about (but have no 
proofs) and may need back-patching.

-- 
regards, Andrei Lepikhov

Вложения

Re: Incorrect calculation of path fraction value in MergeAppend

От
Junwang Zhao
Дата:
Hi Andrei,

On Tue, Apr 29, 2025 at 7:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> Hi,
>
> Commit 6b94e7a has added a "fractional" strategy to the merge join path
> search scope. But as I see it incorrectly calculates tuple_fraction
> value. Let's see one of the regression tests' queries:
>
> EXPLAIN (COSTS OFF)
> SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
>    USING (id) ORDER BY x.id DESC LIMIT 10;
>
> It uses NestedLoop with parameterised inner scan as a subplan of
> MergeAppend. Overall cost is:
> Limit  (cost=1.11..4.86 rows=10 width=16)
>
> Let's reduce the LIMIT a little:
>
> EXPLAIN (COSTS OFF)
> SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
>    USING (id) ORDER BY x.id DESC LIMIT 2;
>
> There, the query plan uses plain NestLoop node and overall cost
> significantly increases:
> Limit  (cost=0.56..30.70 rows=2 width=16)
>
> The origin problem lies in the following code line:
>         path_fraction = (1.0 / root->tuple_fraction);
>
> Building startup, total and fractional MergeAppends, the optimiser puts
> off the 'fractional' way because it has built for a 50% limit (uses
> MergeJoin with high startup cost) and chooses a startup-optimal path
> (built upon plain NestLoops) that is better than the total one.
>
> Instead, the path_fraction value must be corrected when
> root->tuple_fraction contains the absolute value of tuples. See the
> patch in the attachment.
>
> I report this issue into the bugs mailing list because it might cause
> some 'enigmatic' performance slowdowns I have heard about (but have no
> proofs) and may need back-patching.
>
> --
> regards, Andrei Lepikhov

I've apply the patch and the tests passed, one small nitpick:

+ double path_fraction = root->tuple_fraction;
+
+ /* Convert absolute limit to a path fraction */
+ if (path_fraction >= 1.)
+ path_fraction /= childrel->rows;

maybe add `&& childrel->rows > 0.` to the if condition to
avoid potential crash. I'm not sure if `childrel->rows` will
be 0 but I see get_cheapest_fractional_path did this.

/* Convert absolute # of tuples to a fraction; no need to clamp to 0..1 */
if (tuple_fraction >= 1.0 && best_path->rows > 0)
tuple_fraction /= best_path->rows;

--
Regards
Junwang Zhao



Re: Incorrect calculation of path fraction value in MergeAppend

От
Junwang Zhao
Дата:
On Thu, May 8, 2025 at 2:42 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 7/5/2025 12:45, Álvaro Herrera wrote:
>  > On 2025-May-07, Andrei Lepikhov wrote:
>  >> The magic is how it finds a link to the thread. This time it doesn't
>  > Weird, that doesn't seem to work very well if you try to search for
>  > stuff.  But if you just specify the Message-Id without clicking
>  > "search" then it works.  So here you go:
>  > https://commitfest.postgresql.org/patch/5742/
> Thanks! I think this memo should be pinned at the top of the page.
>
> On 7/5/2025 17:35, Junwang Zhao wrote:
> > + /* Convert absolute limit to a path fraction */
> > + if (path_fraction >= 1.)
> > + path_fraction /= childrel->rows;
> >
> > maybe add `&& childrel->rows > 0.` to the if condition to
> > avoid potential crash. I'm not sure if `childrel->rows` will
> > be 0 but I see get_cheapest_fractional_path did this.
> You may look at the 76281aa for the reason. Also, looking into the code,
> I see that it is impossible now to find more places with rows == 0
> except for the Dummy result node. Moreover, it just doesn't make sense
> in the case of append nodes: why should the optimiser spend resources
> and consider paths if it is impossible to pull any tuples from the subplan?
>
> So, I think checking assertion instead would be the better option. See
> new version in the attachment.

Agree, assertion is better, the new version LGTM.

>
> --
> regards, Andrei Lepikhov



--
Regards
Junwang Zhao



Re: Incorrect calculation of path fraction value in MergeAppend

От
Alvaro Herrera
Дата:
On 2025-May-18, Alexander Korotkov wrote:

> I've slightly revised the patch: commit message and one comment.  I
> think this is clearly a bugfix, and it should be backpatched.  I'm
> going to push and backpatch it if no objections.

I think we mostly avoid backpatching changes that can cause significant
planning changes, because the consequences for existing systems could be
catastrophic -- even if the known plan changes are beneficial.  IMO you
need to spend a lot more effort in showing that this can't possibly harm
anything, before backpatching.

Personally I can see getting this in 18beta2 without too much additional
arguing for it.  But for 15-17 you'd need a lot more support than is
evident in this thread.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: Incorrect calculation of path fraction value in MergeAppend

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Personally I can see getting this in 18beta2 without too much additional
> arguing for it.  But for 15-17 you'd need a lot more support than is
> evident in this thread.

+1.  The bar for causing plan changes in back branches is very high.
Unless you can show that the buggy behavior is causing wrong answers,
it's usually best to leave it alone.

            regards, tom lane



Re: Incorrect calculation of path fraction value in MergeAppend

От
Alexander Korotkov
Дата:
On Sun, May 18, 2025 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Personally I can see getting this in 18beta2 without too much additional
> > arguing for it.  But for 15-17 you'd need a lot more support than is
> > evident in this thread.
>
> +1.  The bar for causing plan changes in back branches is very high.
> Unless you can show that the buggy behavior is causing wrong answers,
> it's usually best to leave it alone.

Makes sense.  Thank you for the clarification.  I'll push this to the
master only.

------
Regards,
Alexander Korotkov
Supabase