Обсуждение: Incorrect cost for MergeAppend

Поиск
Список
Период
Сортировка

Incorrect cost for MergeAppend

От
Alexander Kuzmenkov
Дата:
Hello hackers,

While investigating some query plans, I noticed some code that seems
to be wrong: when create_merge_append_path() estimates the cost of
sorting an input, it calls cost_sort() passing subpath->parent->tuples
as the number of tuples. Shouldn't it use subpath->parent->rows or
even subpath->rows instead? The `tuples` variable doesn't account for
the filters on the relation, so this leads to incorrect cost estimates
when there are highly selective filters, and Sort + Append is chosen
instead of MergeAppend.

I don't have a good repro yet because the plans I've been studying
involve a third-party extension, but the code looks incorrect so I
thought I should ask.

Here is a link to this code on GitHub:

https://github.com/postgres/postgres/blob/6a1ea02c491d16474a6214603dce40b5b122d4d1/src/backend/optimizer/util/pathnode.c#L1469-L1477

---
Alexander Kuzmenkov
Timescale



Re: Incorrect cost for MergeAppend

От
Ashutosh Bapat
Дата:
On Mon, Jan 29, 2024 at 6:11 PM Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:
>
> Hello hackers,
>
> While investigating some query plans, I noticed some code that seems
> to be wrong: when create_merge_append_path() estimates the cost of
> sorting an input, it calls cost_sort() passing subpath->parent->tuples
> as the number of tuples. Shouldn't it use subpath->parent->rows or
> even subpath->rows instead? The `tuples` variable doesn't account for
> the filters on the relation, so this leads to incorrect cost estimates
> when there are highly selective filters, and Sort + Append is chosen
> instead of MergeAppend.

All other callers of cost_sort() except plan_cluster_use_sort() are
using rows instead of tuples. Even plan_cluster_use_sort() has
rel->rows = rel->tuples, it's actually passing rows. So agree with
your suggestion. However a test will be good since this code is quite
old.

--
Best Wishes,
Ashutosh Bapat



Re: Incorrect cost for MergeAppend

От
Alexander Kuzmenkov
Дата:
Here is a small patch that reproduces the problem on two tables with
inheritance, and fixes it. I'll add it to the Commitfest.

On Tue, Jan 30, 2024 at 8:20 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Jan 29, 2024 at 6:11 PM Alexander Kuzmenkov
> <akuzmenkov@timescale.com> wrote:
> >
> > Hello hackers,
> >
> > While investigating some query plans, I noticed some code that seems
> > to be wrong: when create_merge_append_path() estimates the cost of
> > sorting an input, it calls cost_sort() passing subpath->parent->tuples
> > as the number of tuples. Shouldn't it use subpath->parent->rows or
> > even subpath->rows instead? The `tuples` variable doesn't account for
> > the filters on the relation, so this leads to incorrect cost estimates
> > when there are highly selective filters, and Sort + Append is chosen
> > instead of MergeAppend.
>
> All other callers of cost_sort() except plan_cluster_use_sort() are
> using rows instead of tuples. Even plan_cluster_use_sort() has
> rel->rows = rel->tuples, it's actually passing rows. So agree with
> your suggestion. However a test will be good since this code is quite
> old.
>
> --
> Best Wishes,
> Ashutosh Bapat

Вложения

Re: Incorrect cost for MergeAppend

От
Aleksander Alekseev
Дата:
Hi,

> Here is a small patch that reproduces the problem on two tables with
> inheritance, and fixes it. I'll add it to the Commitfest.

Thanks for the patch.

I can confirm that it changes the plan from Sort + Append to MergeAppend.

Before:

```
explain (costs off) select * from ma0 where a < 1000 order by a;
                       QUERY PLAN
---------------------------------------------------------
 Sort
   Sort Key: ma0.a
   ->  Append
         ->  Index Only Scan using ma0_pkey on ma0 ma0_1
               Index Cond: (a < 1000)
         ->  Seq Scan on ma1 ma0_2
               Filter: (a < 1000)
```

After:

```
=# explain (costs off) select * from ma0 where a < 1000 order by a;
                    QUERY PLAN
---------------------------------------------------
 Merge Append
   Sort Key: ma0.a
   ->  Index Only Scan using ma0_pkey on ma0 ma0_1
         Index Cond: (a < 1000)
   ->  Sort
         Sort Key: ma0_2.a
         ->  Seq Scan on ma1 ma0_2
               Filter: (a < 1000)
```

The rest of the tests pass.

-- 
Best regards,
Aleksander Alekseev



Re: Incorrect cost for MergeAppend

От
David Rowley
Дата:
On Wed, 31 Jan 2024 at 00:06, Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:
> Here is a small patch that reproduces the problem on two tables with
> inheritance, and fixes it. I'll add it to the Commitfest.

Good catch.

It seems to have been broken since MergeAppends were added in 11cad29c9.

The code fix looks good to me.

The tests look a bit heavy.  I wondered if you could have used a UNION
ALL query with the tenk1 table so you didn't have to create tables,
however, I see that case works ok since the parent->tuples is set in
set_subquery_size_estimates() from "rel->tuples =
sub_final_rel->cheapest_total_path->rows;".

I played around with the following and see your patch changes the plan
to a Merge Append away from an Append -> Sort plan.

create table tenk_parent (like tenk1);
alter table tenk1 inherit tenk_parent;
alter table tenk2 inherit tenk_parent;
explain (costs off) select * from tenk_parent where thousand = 0 order
by tenthous;
alter table tenk1 no inherit tenk_parent;
alter table tenk2 no inherit tenk_parent;

I'm just not sure if we should be messing with tenk1 and tenk2 like that.

You should likely focus on trying to find a test that does not require
making 2 tables with 10k rows.

David



Re: Incorrect cost for MergeAppend

От
Alexander Kuzmenkov
Дата:
On Tue, Jan 30, 2024 at 1:20 PM David Rowley <dgrowleyml@gmail.com> wrote:
> You should likely focus on trying to find a test that does not require
> making 2 tables with 10k rows.

Is 1k smallint OK? It should fit in one page. Still reproduces the
error, and the entire test case runs in under 10 ms.

Вложения

Re: Incorrect cost for MergeAppend

От
David Rowley
Дата:
On Wed, 31 Jan 2024 at 02:23, Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:
>
> On Tue, Jan 30, 2024 at 1:20 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > You should likely focus on trying to find a test that does not require
> > making 2 tables with 10k rows.
>
> Is 1k smallint OK? It should fit in one page. Still reproduces the
> error, and the entire test case runs in under 10 ms.

I had a go at making it a bit smaller without going dangerously close
to where the plan might change. The following seems to work.

create table ma0(a int primary key);
create table ma1() inherits (ma0);
insert into ma0 select generate_series(1, 400);
insert into ma1 select generate_series(1, 200);
analyze ma0;
analyze ma1;

explain (costs off) select * from ma0 where a < 100 order by a;

drop table ma0 cascade;

As for backpatching this.  It does risk plans changing in stable
versions of PostgreSQL, which we normally try to avoid.  However, it
seems quite similar to 1e731ed12a, albeit, far more long-standing.
I'm currently thinking we should backpatch this back to 12.

Does anyone disagree?

David



Re: Incorrect cost for MergeAppend

От
Ashutosh Bapat
Дата:
On Wed, Jan 31, 2024 at 4:33 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 31 Jan 2024 at 02:23, Alexander Kuzmenkov
> <akuzmenkov@timescale.com> wrote:
> >
> > On Tue, Jan 30, 2024 at 1:20 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > > You should likely focus on trying to find a test that does not require
> > > making 2 tables with 10k rows.
> >
> > Is 1k smallint OK? It should fit in one page. Still reproduces the
> > error, and the entire test case runs in under 10 ms.
>
> I had a go at making it a bit smaller without going dangerously close
> to where the plan might change. The following seems to work.
>
> create table ma0(a int primary key);
> create table ma1() inherits (ma0);
> insert into ma0 select generate_series(1, 400);
> insert into ma1 select generate_series(1, 200);
> analyze ma0;
> analyze ma1;
>
> explain (costs off) select * from ma0 where a < 100 order by a;
>
> drop table ma0 cascade;

On my laptop with all default settings

with patch
postgres@231714=#explain select * from ma0 where a < 100 order by a;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Merge Append  (cost=6.94..18.90 rows=198 width=4)
   Sort Key: ma0.a
   ->  Index Only Scan using ma0_pkey on ma0 ma0_1  (cost=0.15..9.88
rows=99 width=4)
         Index Cond: (a < 100)
   ->  Sort  (cost=6.78..7.03 rows=99 width=4)
         Sort Key: ma0_2.a
         ->  Seq Scan on ma1 ma0_2  (cost=0.00..3.50 rows=99 width=4)
               Filter: (a < 100)
(8 rows)

without patch
#explain select * from ma0 where a < 100 order by a;
                              QUERY PLAN
----------------------------------------------------------------------
 Sort  (cost=19.04..19.54 rows=198 width=4)
   Sort Key: ma0.a
   ->  Append  (cost=0.00..11.49 rows=198 width=4)
         ->  Seq Scan on ma0 ma0_1  (cost=0.00..7.00 rows=99 width=4)
               Filter: (a < 100)
         ->  Seq Scan on ma1 ma0_2  (cost=0.00..3.50 rows=99 width=4)
               Filter: (a < 100)
(7 rows)

postgres@233864=#select (19.54 - 18.90)/19.54, (19.04 - 18.09)/19.04;
        ?column?        |        ?column?
------------------------+------------------------
 0.03275332650972364381 | 0.04989495798319327731
(1 row)

Those numbers are higher than 1% (#define STD_FUZZ_FACTOR 1.01) but
slight variation in all the GUCs that affect cost, might bring the
difference closer to STD_FUZZ_FACTOR.

Given how close they are, maybe it's not such a good idea to
backpatch. If the plans change for better, users won't complain but
otherwise they will complain and will have no way to go back to their
good plans in production.

--
Best Wishes,
Ashutosh Bapat



Re: Incorrect cost for MergeAppend

От
David Rowley
Дата:
On Wed, 31 Jan 2024 at 18:58, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> with patch
>  Merge Append  (cost=6.94..18.90 rows=198 width=4)

> without patch
>  Sort  (cost=19.04..19.54 rows=198 width=4)

> Those numbers are higher than 1% (#define STD_FUZZ_FACTOR 1.01) but
> slight variation in all the GUCs that affect cost, might bring the
> difference closer to STD_FUZZ_FACTOR.
>
> Given how close they are, maybe it's not such a good idea to
> backpatch.

The reason those numbers are close is because I reduced the row count
on the test tables to a point where we only just get the Merge Append
plan with a small margin.  I don't see the test case costs as a
relevant factor for if we backpatch or not.

What is relevant are things like:

For:
* It's a clear bug and what's happening now is clearly wrong.
* inheritance/partitioned table plan changes for the better in minor versions

Against:
* Nobody has complained for 13 years, so maybe it's unlikely anyone is
suffering too much.
* Possibility of inheritance/partitioned table plans changing for the
worse in minor versions

For now, I'm on the fence on this one.

David



Re: Incorrect cost for MergeAppend

От
Ashutosh Bapat
Дата:
On Wed, Jan 31, 2024 at 12:12 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> What is relevant are things like:
>
> For:
> * It's a clear bug and what's happening now is clearly wrong.
> * inheritance/partitioned table plan changes for the better in minor versions
>
> Against:
> * Nobody has complained for 13 years, so maybe it's unlikely anyone is
> suffering too much.
> * Possibility of inheritance/partitioned table plans changing for the
> worse in minor versions
>

That's what I am thinking as well. And the plans that may change for
the worse are the ones where the costs with and without the patch are
close.

Just to be clear, the change is for good and should be committed to
the master. It's the backpatching I am worried about.

--
Best Wishes,
Ashutosh Bapat



Re: Incorrect cost for MergeAppend

От
Alexander Kuzmenkov
Дата:
I'd be happy to see this backpatched. What kind of regressions are we
worried about? I'd say partition-wise sort + merge should be faster
than append + sort for reasonably sized tables. That's basically what
tuplesort does inside. Moreso, this can enable index scans on
partitions, which is an even better plan.

On Wed, Jan 31, 2024 at 7:46 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jan 31, 2024 at 12:12 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > What is relevant are things like:
> >
> > For:
> > * It's a clear bug and what's happening now is clearly wrong.
> > * inheritance/partitioned table plan changes for the better in minor versions
> >
> > Against:
> > * Nobody has complained for 13 years, so maybe it's unlikely anyone is
> > suffering too much.
> > * Possibility of inheritance/partitioned table plans changing for the
> > worse in minor versions
> >
>
> That's what I am thinking as well. And the plans that may change for
> the worse are the ones where the costs with and without the patch are
> close.
>
> Just to be clear, the change is for good and should be committed to
> the master. It's the backpatching I am worried about.
>
> --
> Best Wishes,
> Ashutosh Bapat



Re: Incorrect cost for MergeAppend

От
Alexander Kuzmenkov
Дата:
On Wed, Jan 31, 2024 at 1:33 PM Alexander Kuzmenkov
<akuzmenkov@timescale.com> wrote:
> I'd be happy to see this backpatched. What kind of regressions are we
> worried about? I'd say partition-wise sort + merge should be faster
> than append + sort for reasonably sized tables. That's basically what
> tuplesort does inside. Moreso, this can enable index scans on
> partitions, which is an even better plan.

To put it another way, this change enables our usual cost model for
MergeAppend to work correctly in the presence of filters. I think we
can consider this model to be reasonably correct, and we don't
currently have major problems with MergeAppend being chosen instead of
Sort + Append in cases where it's suboptimal, right? So applying it
properly in case with filters is not likely to introduce problems.



Re: Incorrect cost for MergeAppend

От
Alvaro Herrera
Дата:
On 2024-Jan-31, Alexander Kuzmenkov wrote:

> To put it another way, this change enables our usual cost model for
> MergeAppend to work correctly in the presence of filters. I think we
> can consider this model to be reasonably correct, and we don't
> currently have major problems with MergeAppend being chosen instead of
> Sort + Append in cases where it's suboptimal, right? So applying it
> properly in case with filters is not likely to introduce problems.

Since we have a minor coming up very soon, I think it's not a good idea
to backpatch right now.  Maybe you can push to master now, and consider
whether to backpatch later.

The problem is -- if somebody has an application that gets good plans
with the current cost model, and you change the cost model and the plans
become worse, what do they do?  If you change this in a major release,
this is not an issue because they must test their queries before
upgrading and if they fail to realize a problem exists then it's their
fault.  If you change it in a minor release, then those people will be
very upset that things were changed suddenly, and they may get wary of
future minor upgrades, which we don't want.

Plus, they would need to do careful testing before doing the minor
upgrade.

Maybe plans can only go from bad to good and never from good to bad.
But are we 100% certain that that is the case?

People who are **desperate** to get this improvement can perhaps run a
patched version in the meantime.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Incorrect cost for MergeAppend

От
"Daniel Westermann (DWE)"
Дата:
Hi,

>Since we have a minor coming up very soon, I think it's not a good idea
>to backpatch right now.  Maybe you can push to master now, and consider
>whether to backpatch later.

>The problem is -- if somebody has an application that gets good plans
>with the current cost model, and you change the cost model and the plans
>become worse, what do they do?  If you change this in a major release,
>this is not an issue because they must test their queries before
>upgrading and if they fail to realize a problem exists then it's their
>fault.  If you change it in a minor release, then those people will be
>very upset that things were changed suddenly, and they may get wary of
>future minor upgrades, which we don't want.

I agree with this, especially as we tell our customers that such changes do not happen from one minor release to another.

Regards
Daniel

Re: Incorrect cost for MergeAppend

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Since we have a minor coming up very soon, I think it's not a good idea
> to backpatch right now.  Maybe you can push to master now, and consider
> whether to backpatch later.

As a rule, we don't back-patch changes like this ever.  People don't
appreciate plans changing in minor releases.

            regards, tom lane



Re: Incorrect cost for MergeAppend

От
David Rowley
Дата:
On Thu, 1 Feb 2024 at 04:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Since we have a minor coming up very soon, I think it's not a good idea
> > to backpatch right now.  Maybe you can push to master now, and consider
> > whether to backpatch later.
>
> As a rule, we don't back-patch changes like this ever.  People don't
> appreciate plans changing in minor releases.

Pushed to master.

Thanks for the report and the fix, Alexander.



Re: Incorrect cost for MergeAppend

От
Alexander Kuzmenkov
Дата:
On Wed, Jan 31, 2024 at 9:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Pushed to master.
>
> Thanks for the report and the fix, Alexander.

Thank you!