Обсуждение: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

Поиск
Список
Период
Сортировка

BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17564
Logged by:          Martijn van Oosterhout
Email address:      kleptog@gmail.com
PostgreSQL version: 14.4
Operating system:   Debian Linux (Bullseye)
Description:

We ran into a strange planner issue on our production system on Friday.
Basically, a query would fail in the planning stage, depending one of the
parameters. This shouldn't happen (AIUI). The query is as follows
(simplified considerably from the original):

db=# explain SELECT generate_subscripts(ARRAY[]::integer[], 1) AS id, 
       unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestamp 
FROM results 
JOIN groups ON groups.id = results.group_id 
WHERE results.search_id = 3336 
order by timestamp;

(Yes, I know the combination of unnest() and generate_subscripts() in this
way is evil, but it does work.)

The error is:

ERROR:  set-valued function called in context that cannot accept a set
LINE 2:        unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestam...

However, if you disable nested loops, it works fine:

db=# set enable_nestloop =false;
SET
db=# explain SELECT generate_subscripts(ARRAY[]::integer[], 1) AS id, 
       unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestamp 
FROM results 
JOIN groups ON groups.id = results.group_id 
WHERE results.search_id = 3336 
order by timestamp;
                                                        QUERY PLAN
                                             

--------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=54523.19..55478.19 rows=382000 width=36)
   Sort Key: (((unnest('{}'::jsonb[])) ->> 'timestamp'::text))
   ->  Result  (cost=19.75..8658.15 rows=382000 width=36)
         ->  ProjectSet  (cost=19.75..1973.15 rows=382000 width=36)
               ->  Hash Join  (cost=19.75..59.33 rows=382 width=0)
                     Hash Cond: (groups.id = results.group_id)
                     ->  Seq Scan on groups  (cost=0.00..36.54 rows=1154
width=4)
                     ->  Hash  (cost=14.97..14.97 rows=382 width=4)
                           ->  Index Only Scan using results_pkey on results
 (cost=0.29..14.97 rows=382 width=4)
                                 Index Cond: (search_id = 3336)
(10 rows)

If you remove the ORDER BY, it works.

If you remove the generate_series(), it works.

If you remove the JOIN, it works.

If you remove the "->> 'timestamp'", it works.

If you wrap the query in a subquery without the ORDER BY, and then the put
the ORDER BY on that, it works. Like so:

explain SELECT * FROM (SELECT generate_subscripts(ARRAY[]::integer[], 1) AS
id, 
       unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestamp 
FROM results 
JOIN groups ON groups.id = results.group_id 
WHERE results.search_id = 3336) x
order by timestamp;

This gives the same query plan as above after disabling the nested loops.

What appears to be happening is that the planner attempts a transformation
and places the unnest() in the ORDER BY statement replacing the reference to
the "timestamp" field with the actual expression. There unnest() is clearly
not allowed. Perhaps the fact that the unnest() is hidden beneath the
operator(->>) prevents the planner from noticing the transformation is not
permitted.

This is a pain to reproduce. Just dumping are restoring the tables elsewhere
did not work. We noticed that the 'results' table in production was quite
bloated and when we replicated that in the test environment, it finally
triggered. At least at 300% bloat it triggered, that probably triggers the
planner to try some other plans.

This is not a critical bug, since it is easily worked around and the
combination of conditions seems quite unusual.

Noticed in 13.4, reproduced in 13.7 and 14.4.

Have a nice day,
Martijn


Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> db=# explain SELECT generate_subscripts(ARRAY[]::integer[], 1) AS id, 
>        unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestamp 
> FROM results 
> JOIN groups ON groups.id = results.group_id 
> WHERE results.search_id = 3336 
> order by timestamp;

> The error is:

> ERROR:  set-valued function called in context that cannot accept a set
> LINE 2:        unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestam...

Hmm, that certainly seems like a bug, but I fear it's impossible
to investigate without a reproducible test case.

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Martijn van Oosterhout
Дата:
Hoi Tom, pgsql-bugs,

Now it's morning I've thought of a way to reproduce it more easily, see the attached script. The tricky part is getting the tuples in a position that triggers the planner in the right way. So the script includes a list of (ctid, primary key) and creates a table using that with quite a large amount of bloat. It then creates some constraints, vacuums and runs the offending query. On my system it reproduces with 100% reliability (so far anyway).

10:41 $ PGPASSWORD=pass psql -h 127.0.0.1 -U postgres db2 < /tmp/reproduce2.sql
                                                           version                                                          
-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

DROP TABLE
DROP TABLE
DROP TABLE
NOTICE:  table "input" does not exist, skipping
DROP TABLE
CREATE TABLE
CREATE TABLE
COPY 1438
INSERT 0 166
ALTER TABLE
SELECT 192000
SELECT 192000
DELETE 1356
ALTER TABLE
DELETE 189206
VACUUM
VACUUM
ERROR:  set-valued function called in context that cannot accept a set
LINE 2:        unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestam...
               ^

Hope this helps,
Martijn


On Tue, 2 Aug 2022 at 00:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> db=# explain SELECT generate_subscripts(ARRAY[]::integer[], 1) AS id,
>        unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestamp
> FROM results
> JOIN groups ON groups.id = results.group_id
> WHERE results.search_id = 3336
> order by timestamp;

> The error is:

> ERROR:  set-valued function called in context that cannot accept a set
> LINE 2:        unnest(ARRAY[]::jsonb[]) ->> 'timestamp'  AS timestam...

Hmm, that certainly seems like a bug, but I fear it's impossible
to investigate without a reproducible test case.

                        regards, tom lane


--
Вложения

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Richard Guo
Дата:

On Tue, Aug 2, 2022 at 4:50 PM Martijn van Oosterhout <kleptog@gmail.com> wrote:
Now it's morning I've thought of a way to reproduce it more easily, see the attached script. The tricky part is getting the tuples in a position that triggers the planner in the right way. So the script includes a list of (ctid, primary key) and creates a table using that with quite a large amount of bloat. It then creates some constraints, vacuums and runs the offending query. On my system it reproduces with 100% reliability (so far anyway).

Thanks for the report! I can reproduce it on HEAD. The part of the plan
that causes problem looks like:

->  Gather Merge
      Output: results.group_id
      Workers Planned: 1
      ->  Sort
            Output: results.group_id, ((unnest('{}'::jsonb[]) ->> 'timestamp'::text))
            Sort Key: ((unnest('{}'::jsonb[]) ->> 'timestamp'::text))
            ->  Parallel Seq Scan on public.results
                  Output: results.group_id, (unnest('{}'::jsonb[]) ->> 'timestamp'::text)
                  Filter: (results.search_id = 3336)

Please note that the expression below appears in the targetlist of
seqscan:

    unnest('{}'::jsonb[]) ->> 'timestamp'::text

The func for operator '->>' itself is not proretset, but one of its args
(the unnest func) is proretset. And that causes problem when we set up
projection info for the seqscan node.

So why does this expression appear in the targetlist of seqscan node? I
did some debug on that. Since this expression is the desired pathkey of
the query, relation_can_be_sorted_early() checks on it and finds that it
can be computed from the reltarget of rel 'results', which is true since
this expression can be computed all by itself. So it is considered as a
useful ordering for rel 'results' and generate_useful_gather_paths()
creates the Sort and then Gather Merge nodes for 'results' based on this
pathkey.

Still need more time to investigate for the fix.

Thanks
Richard

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Richard Guo
Дата:

On Tue, Aug 2, 2022 at 7:28 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Aug 2, 2022 at 4:50 PM Martijn van Oosterhout <kleptog@gmail.com> wrote:
Now it's morning I've thought of a way to reproduce it more easily, see the attached script. The tricky part is getting the tuples in a position that triggers the planner in the right way. So the script includes a list of (ctid, primary key) and creates a table using that with quite a large amount of bloat. It then creates some constraints, vacuums and runs the offending query. On my system it reproduces with 100% reliability (so far anyway).

Thanks for the report! I can reproduce it on HEAD. The part of the plan
that causes problem looks like:

->  Gather Merge
      Output: results.group_id
      Workers Planned: 1
      ->  Sort
            Output: results.group_id, ((unnest('{}'::jsonb[]) ->> 'timestamp'::text))
            Sort Key: ((unnest('{}'::jsonb[]) ->> 'timestamp'::text))
            ->  Parallel Seq Scan on public.results
                  Output: results.group_id, (unnest('{}'::jsonb[]) ->> 'timestamp'::text)
                  Filter: (results.search_id = 3336)

Please note that the expression below appears in the targetlist of
seqscan:

    unnest('{}'::jsonb[]) ->> 'timestamp'::text

The func for operator '->>' itself is not proretset, but one of its args
(the unnest func) is proretset. And that causes problem when we set up
projection info for the seqscan node.

So why does this expression appear in the targetlist of seqscan node? I
did some debug on that. Since this expression is the desired pathkey of
the query, relation_can_be_sorted_early() checks on it and finds that it
can be computed from the reltarget of rel 'results', which is true since
this expression can be computed all by itself. So it is considered as a
useful ordering for rel 'results' and generate_useful_gather_paths()
creates the Sort and then Gather Merge nodes for 'results' based on this
pathkey.

Still need more time to investigate for the fix.

In relation_can_be_sorted_early(), we try to find an EC member that
matches some reltarget or can be computed from the reltarget. If we find
out such an EC member, we check whether it involves set-returning
functions (and reject it if so) with IS_SRF_CALL, which only tests
funcretset/opretset flag of the expression, without checking further
into its args. I'm wondering if this is enough.

I'm considering a fix as checking the EC member expression recursively
with expression_returns_set(), something like below:

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 60c0e3f108..7991295548 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -986,7 +986,7 @@ relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
                 * one are effectively checking properties of targetexpr, so there's
                 * no point in asking whether some other EC member would be better.)
                 */
-               if (IS_SRF_CALL((Node *) em->em_expr))
+               if (expression_returns_set((Node *) em->em_expr))
                        continue;

                /*
@@ -1014,7 +1014,7 @@ relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel,
         * member in this case; since SRFs can't appear in WHERE, they cannot
         * belong to multi-member ECs.)
         */
-       if (IS_SRF_CALL((Node *) em->em_expr))
+       if (expression_returns_set((Node *) em->em_expr))
                return false;

        return true;

Does this make any sense?

Thanks
Richard 

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> -               if (IS_SRF_CALL((Node *) em->em_expr))
> +               if (expression_returns_set((Node *) em->em_expr))

Yeah.  I think we need to go further: placed in optimizer.h as
it is, IS_SRF_CALL is just an invitation to writing broken code.
I think the calls in tlist.c are legit because that code is
precisely concerned with breaking down nests of SRFs; but
everywhere else, if you write IS_SRF_CALL you're doing it wrong.
We should take that macro out of the header altogether.

I kind of wonder whether IS_SRF_CALL is a good idea at all.
It'd be more maintainable to have an expression_returns_set_one_level,
or something like that, placed beside expression_returns_set.

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> On Tue, Aug 2, 2022 at 4:50 PM Martijn van Oosterhout <kleptog@gmail.com>
> wrote:
>> Now it's morning I've thought of a way to reproduce it more easily, see
>> the attached script.

> Thanks for the report! I can reproduce it on HEAD.

FWIW, this reproduces the bug for me in v13 and v14, but not v15 or HEAD.
While the method to fix the bug seems clear enough, it doesn't seem
like we have a test case that's stable enough to be worth anything
as a regression test.  Hmmm...

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
I wrote:
> FWIW, this reproduces the bug for me in v13 and v14, but not v15 or HEAD.

A bit of bisecting later, I find that the behavior changed at

commit db0d67db2401eb6238ccc04c6407a4fd4f985832
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Thu Mar 31 00:09:11 2022 +0200

    Optimize order of GROUP BY keys

I think that either that commit is buggy, or the commit message
omitted so many relevant facts as to verge on a lie.  The query
we are dealing with here has no GROUP BY whatsoever, but nonetheless
that commit produces a totally different plan (with a significantly
cheaper cost estimate) than its immediate predecessor.  I don't
see anything specifically about SRFs in that patch, but I think
what must have happened is that it changed cost estimates for
this scenario enough to accidentally cause selection of a
non-buggy plan.

The plan produced as of that commit, which is the same as what
you get in current HEAD, is

 Sort  (cost=2946.79..3096.79 rows=60000 width=36)
   Sort Key: (((unnest('{}'::jsonb[])) ->> 'timestamp'::text))
   ->  Result  (cost=90.08..1446.79 rows=60000 width=36)
         ->  ProjectSet  (cost=90.08..396.79 rows=60000 width=36)
               ->  Hash Join  (cost=90.08..96.19 rows=60 width=0)
                     Hash Cond: (groups.group_id = results.group_id)
                     ->  Seq Scan on groups  (cost=0.00..5.66 rows=166 width=4)
                     ->  Hash  (cost=89.33..89.33 rows=60 width=4)
                           ->  Index Only Scan using results_pkey on results  (cost=0.28..89.33 rows=60 width=4)
                                 Index Cond: (search_id = 3336)

I can't easily produce a nice EXPLAIN result for the previous plan,
since it fails in executor startup, but a heavily-trimmed pprint
dump is enough to show that it's totally different:

      {RESULT
      :startup_cost 4071.64
      :total_cost 5445.55
      :plan_rows 60000
      :lefttree
         {PROJECTSET
         :startup_cost 4071.64
         :total_cost 4395.55
         :plan_rows 60000
         :lefttree
            {NESTLOOP
            :startup_cost 4071.64
            :total_cost 4094.95
            :plan_rows 60
            :lefttree
               {GATHERMERGE
               :startup_cost 4071.48
               :total_cost 4078.32
               :plan_rows 60
               :lefttree
                  {SORT
                  :startup_cost 3071.47
                  :total_cost 3071.56
                  :plan_rows 35
                  :lefttree
                     {SEQSCAN
                     :startup_cost 0.00
                     :total_cost 3070.57
                     :plan_rows 35
                     :parallel_aware true
                     :parallel_safe true
                     :targetlist (
                        ...
                        {TARGETENTRY
                        :expr
                           {OPEXPR
                           :opno 3477
                           :opfuncid 3214
                           :opresulttype 25
                           :opretset false
                           :opcollid 100
                           :inputcollid 100
                           :args (
                              {FUNCEXPR
                              :funcid 2331
                              :funcresulttype 3802
                              :funcretset true    -- OOPS
                     ...
                     :scanrelid 1    -- this scan is on "results"
                     ...
               }
            :righttree
               {MEMOIZE
               :startup_cost 0.15
               :total_cost 0.31
               :plan_rows 1
               :lefttree
                  {INDEXONLYSCAN
                  :startup_cost 0.14
                  :total_cost 0.30
                  :plan_rows 1
                  :scanrelid 2       -- this scan is on "groups"
                  }
               :righttree <>
               }

So I'd like to know why an ostensibly unrelated commit changed
cost estimates by 43% for a query that hasn't even got a GROUP BY.

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Martijn van Oosterhout
Дата:
Hoi Tom,

On Wed, 3 Aug 2022 at 16:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> On Tue, Aug 2, 2022 at 4:50 PM Martijn van Oosterhout <kleptog@gmail.com>
> wrote:
>> Now it's morning I've thought of a way to reproduce it more easily, see
>> the attached script.

> Thanks for the report! I can reproduce it on HEAD.

FWIW, this reproduces the bug for me in v13 and v14, but not v15 or HEAD.
While the method to fix the bug seems clear enough, it doesn't seem
like we have a test case that's stable enough to be worth anything
as a regression test.  Hmmm...


Looking at what Richard has uncovered, it appears the issue is related to the table simply being big enough that it considers a parallel seqscan plan, and then fails. Something I can look into in the morning.

The part I haven't seen explained is why the generate_series() is important. My guess is that if you replace it with an expression it is no longer an SRF and it produces some completely different plan that prevents the problematic path being triggered.

Nice that the problem has been found!
--

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
I wrote:
>> FWIW, this reproduces the bug for me in v13 and v14, but not v15 or HEAD.

I decided I'd wasted entirely too much time trying to find a suitable
test case, so I pushed the bug fix with no new regression test.

Meanwhile, back at the question of whether db0d67db2 is buggy,
it looks like that reduces to whether it was intentional that that
made a large change in estimated sort costs.  What I'm getting from
v15 and HEAD, as I said earlier, is:

 Sort  (cost=2946.79..3096.79 rows=60000 width=36)
   Sort Key: (((unnest('{}'::jsonb[])) ->> 'timestamp'::text))
   ->  Result  (cost=90.08..1446.79 rows=60000 width=36)
         ->  ProjectSet  (cost=90.08..396.79 rows=60000 width=36)
               ->  Hash Join  (cost=90.08..96.19 rows=60 width=0)
                     Hash Cond: (groups.group_id = results.group_id)
                     ->  Seq Scan on groups  (cost=0.00..5.66 rows=166 width=4)
                     ->  Hash  (cost=89.33..89.33 rows=60 width=4)
                           ->  Index Only Scan using results_pkey on results  (cost=0.28..89.33 rows=60 width=4)
                                 Index Cond: (search_id = 3336)

After applying this patch, what I get from v13 and v14 is

 Sort  (cost=6208.59..6358.59 rows=60000 width=36)
   Sort Key: (((unnest('{}'::jsonb[])) ->> 'timestamp'::text))
   ->  Result  (cost=90.08..1446.79 rows=60000 width=36)
         ->  ProjectSet  (cost=90.08..396.79 rows=60000 width=36)
               ->  Hash Join  (cost=90.08..96.19 rows=60 width=0)
                     Hash Cond: (groups.group_id = results.group_id)
                     ->  Seq Scan on groups  (cost=0.00..5.66 rows=166 width=4)
                     ->  Hash  (cost=89.33..89.33 rows=60 width=4)
                           ->  Index Only Scan using results_pkey on results  (cost=0.28..89.33 rows=60 width=4)
                                 Index Cond: (search_id = 3336)

So this plan is identical except for the sort costs, which seem to
be about half of what they were in the older branches.  If that was
intentional, why didn't the commit message mention it?  It's not
exactly a minor change, and enable_group_by_reordering doesn't
seem to have any effect on it.

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@gmail.com> writes:
> The part I haven't seen explained is why the generate_series() is
> important. My guess is that if you replace it with an expression it is no
> longer an SRF and it produces some completely different plan that prevents
> the problematic path being triggered.

I think the generate_series() forces a ProjectSet to be put atop the
join, where we'd probably not have done that without it, having made
the bogus assumption that we could evaluate the sort-key SRF at scan
level.  Why that changes the costs enough to mask or not mask the bug
is still obscure.

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Richard Guo
Дата:

On Wed, Aug 3, 2022 at 10:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> On Tue, Aug 2, 2022 at 4:50 PM Martijn van Oosterhout <kleptog@gmail.com>
> wrote:
>> Now it's morning I've thought of a way to reproduce it more easily, see
>> the attached script.

> Thanks for the report! I can reproduce it on HEAD.

FWIW, this reproduces the bug for me in v13 and v14, but not v15 or HEAD.
While the method to fix the bug seems clear enough, it doesn't seem
like we have a test case that's stable enough to be worth anything
as a regression test.  Hmmm...

I'm reproducing this bug in HEAD with the repro given by Martijn, but
with some additional GUC sets:

set enable_hashjoin to off;
set enable_mergejoin to off;
set enable_indexscan to off;
set enable_bitmapscan to off;

When building access paths for base rel 'results', we would generate
Gather Merge on top of its partial path. The key point to reproduce this
bug is how to make this Gather Merge path win in the final plan.

Another way to reproduce this bug in HEAD is to manually hack the cost
of this Gather Merge path to zero with gdb, so that this bug can be
reproduced even with hashjoin.

Thanks
Richard

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> I'm reproducing this bug in HEAD with the repro given by Martijn, but
> with some additional GUC sets:

> set enable_hashjoin to off;
> set enable_mergejoin to off;
> set enable_indexscan to off;
> set enable_bitmapscan to off;

Hm, does not work for me ... although I assume you mean bc76f5ac4
or before?  It should definitely not happen after 1aa8dad41,
unless there's an additional bug ...

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Richard Guo
Дата:

On Thu, Aug 4, 2022 at 9:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> I'm reproducing this bug in HEAD with the repro given by Martijn, but
> with some additional GUC sets:

> set enable_hashjoin to off;
> set enable_mergejoin to off;
> set enable_indexscan to off;
> set enable_bitmapscan to off;

Hm, does not work for me ... although I assume you mean bc76f5ac4
or before?  It should definitely not happen after 1aa8dad41,
unless there's an additional bug ...

Yeah, I tested in bc76f5ac4. Attached is a regression test I composed.
I tested it locally and it can catch this bug before 1aa8dad41 and give
the expected answer after 1aa8dad41.

Thanks
Richard
Вложения

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Martijn van Oosterhout
Дата:
Hoi Richard, Tom,

On Thu, 4 Aug 2022 at 05:16, Richard Guo <guofenglinux@gmail.com> wrote:

Yeah, I tested in bc76f5ac4. Attached is a regression test I composed.
I tested it locally and it can catch this bug before 1aa8dad41 and give
the expected answer after 1aa8dad41.


It looks like the issue has been found, nice! So I'm wondering: which release will this fix be included (so we can mark the workaround with an end date).

Thanks in advance.
--

Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> Yeah, I tested in bc76f5ac4. Attached is a regression test I composed.
> I tested it locally and it can catch this bug before 1aa8dad41 and give
> the expected answer after 1aa8dad41.

Nice!  I found we could simplify it even further: we don't seem to
need the generate_subscripts() to provoke the error in this query.
And it fails in v13-v14, too.  Pushed.

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@gmail.com> writes:
> It looks like the issue has been found, nice! So I'm wondering: which
> release will this fix be included (so we can mark the workaround with an
> end date).

Next week's.

https://www.postgresql.org/developer/roadmap/

            regards, tom lane



Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

От
Dmitry Dolgov
Дата:
> On Wed, Aug 03, 2022 at 05:44:17PM -0400, Tom Lane wrote:
>
> Meanwhile, back at the question of whether db0d67db2 is buggy,
> it looks like that reduces to whether it was intentional that that
> made a large change in estimated sort costs.
>
> [...]
>
> So this plan is identical except for the sort costs, which seem to
> be about half of what they were in the older branches.  If that was
> intentional, why didn't the commit message mention it?  It's not
> exactly a minor change, and enable_group_by_reordering doesn't
> seem to have any effect on it.

I got curious about this one, as I haven't had a chance to look at the
final versions of the "group by reordering" feature. The commit message
indeed doesn't mention it directly, but there are changes inside
cost_tuplesort that are affecting this plan. The description of those
changes and the math behind are pretty neat (kudos to the author), but
to my surprise on the query from this thread the final result for
startup_costs is missing any ~ LOG2(tuples) term in comparison with the
original implementation. This happens because estimate_num_groups_incremental
returns estimation value 1 for number of groups, which sounds strange to
me. Not sure if there is anything wrong here, or I'm missing something,
but at least falling back to geometric mean as an estimation for nGroups
seems to produce results closer to the original and reduces discrepancy
between costs observed here.