Обсуждение: Re: Our trial to TPC-DS but optimizer made unreasonable plan

Поиск
Список
Период
Сортировка

Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Kouhei Kaigai
Дата:
> On Mon, Aug 17, 2015 at 6:40 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Here is one other thing I could learn from TPC-DS benchmark.
> >
> > The attached query is Q4 of TPC-DS, and its result was towards SF=100.
> > It took long time to compete (about 30min), please see the attached
> > EXPLAIN ANALYZE output.
> >
> Look at this:
> 
>  ->  CTE Scan on year_total t_s_firstyear  (cost=0.00..13120715.27
> rows=3976 width=52) (actual time=0.020..5425.980 rows=1816438 loops=1)
>                            Filter: ((year_total > '0'::numeric) AND
> (sale_type = 's'::text) AND (dyear = 2001))
>                            Rows Removed by Filter: 19879897
>  ->  CTE Scan on year_total t_s_secyear  (cost=0.00..11927922.98
> rows=11928 width=164) (actual time=0.007..45.249 rows=46636 loops=1)
>                            Filter: ((sale_type = 's'::text) AND (dyear = 2002))
>                            Rows Removed by Filter: 185596
> 
> CTE expansion shall help here as we can push the filer down. I did a
> quick patch to demonstrate the idea, following Tom's proposal
> (38448.1430519406@sss.pgh.pa.us). I see obvious performance boost:
> 
> Turn off NLJ:
>  original: Planning time: 4.391 ms
>  Execution time: 77113.721 ms
>  patched: Planning time: 8.429 ms
>  Execution time: 18572.663 ms
> 
> + work_mem to 1G
>  original: Planning time: 4.487 ms
>  Execution time: 29249.466 ms
>  patched: Planning time: 11.148 ms
>  Execution time: 7309.586 ms
> 
> Attached please find the WIP patch and also the ANALYZE results.
> Notes: the patch may not directly apply to head as some network issue
> here so my Linux box can't talk to git server.
>
Thanks for your patch.
Let me test and report the result in my environment.

BTW, did you register the patch on the upcoming commit-fest?
I think it may be a helpful feature, if we can add alternative
subquery-path towards cte-scan on set_cte_pathlist() and choose
them according to the cost estimation.

Best regards,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Qingqing Zhou
Дата:
On Tue, Aug 18, 2015 at 5:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> BTW, did you register the patch on the upcoming commit-fest?
>
Not yet, it is in WIP status.

> I think it may be a helpful feature, if we can add alternative
> subquery-path towards cte-scan on set_cte_pathlist() and choose
> them according to the cost estimation.
>
Are you suggesting that we keep both subquery-path (whenever possible)
and cte-path so that optimizer can choose among them?

I could imagine that if we could support "materialize cte once and use
multiple times" execution, then we shall be able to benefit from
keeping cte-path. But seems we still don't support this execution
mode.

Regards,
Qingqing



Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Qingqing Zhou
Дата:
On Wed, Aug 19, 2015 at 10:32 AM, Qingqing Zhou
<zhouqq.postgres@gmail.com> wrote:
> On Tue, Aug 18, 2015 at 5:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> BTW, did you register the patch on the upcoming commit-fest?
>>
> Not yet, it is in WIP status.
>

While I am working on the patch, I found some issues and resort help
here. Patch attached.

Here is an example:

postgres=# explain WITH q AS (
    WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on p.i>=p1.i)
SELECT * FROM q WHERE i <= 5;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Nested Loop  (cost=0.58..5980.16 rows=133333 width=8)
   ->  Index Scan using ai on a  (cost=0.29..8.36 rows=4 width=8)
         Index Cond: (i <= 5)
   ->  Index Only Scan using ai on a a_1  (cost=0.29..1159.62
rows=33333 width=4)
         Index Cond: (i <= a.i)
(5 rows)

So far so good. But if we add other references of the CTE q (m1->m,
m->q), we still have some extra CTE scans:

postgres=# explain WITH q AS (
    WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on
p.i>=p1.i), m as (select * from q), m1 as (select * from m)
SELECT * FROM m1 WHERE i <= 5;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 CTE Scan on m  (cost=158365985.66..233365985.65 rows=1111111111 width=8)
   Filter: (i <= 5)
   CTE q
     ->  Nested Loop  (cost=0.29..91699319.00 rows=3333333333 width=8)
           ->  Seq Scan on a  (cost=0.00..1443.00 rows=100000 width=8)
           ->  Index Only Scan using ai on a a_1  (cost=0.29..583.65
rows=33333 width=4)
                 Index Cond: (i <= a.i)
   CTE m
     ->  CTE Scan on q  (cost=0.00..66666666.66 rows=3333333333 width=8)
(9 rows)

Above two queries essentially the same, but the second one is a
non-optimal plan. The reason is that how my patch works: it put a
substitution in front of SS_process_ctes():

   /*
  * If there is a WITH list, process each WITH query and build an initplan
! * SubPlan structure for it. Before we process ctes, try to subsitute with
! * subqueries to benefits from global optimization.
  */
  if (parse->cteList)
+ {
+ substitute_ctes_with_subqueries(root);
  SS_process_ctes(root);
+ }

AFAICS, the substitution only handles cteList within a query block, so
it does not go across the subquery boundary. I can see this is an
issue but can't see a nice way to fix it. Anybody has some recipe?

Regards,
Qingqing

Вложения

Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Tom Lane
Дата:
Qingqing Zhou <zhouqq.postgres@gmail.com> writes:
> Above two queries essentially the same, but the second one is a
> non-optimal plan. The reason is that how my patch works: it put a
> substitution in front of SS_process_ctes():

>    /*
>   * If there is a WITH list, process each WITH query and build an initplan
> ! * SubPlan structure for it. Before we process ctes, try to subsitute with
> ! * subqueries to benefits from global optimization.
>   */
>   if (parse->cteList)
> + {
> + substitute_ctes_with_subqueries(root);
>   SS_process_ctes(root);
> + }

> AFAICS, the substitution only handles cteList within a query block, so
> it does not go across the subquery boundary. I can see this is an
> issue but can't see a nice way to fix it. Anybody has some recipe?

It seems like you're doing this in fundamentally the wrong place.

What I had in mind in <38448.1430519406@sss.pgh.pa.us> was to convert CTEs
into plain subqueries during the prepjointree phase, either just before
or as part of the pull_up_subqueries pass (since you'd want the converted
subquery to be flattened if possible).  If you do it later than that,
then you'll have to reinvent a whole bunch of wheels to provide behavior
similar to regular subquery optimization.
        regards, tom lane



Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Tom Lane
Дата:
I wrote:
> What I had in mind in <38448.1430519406@sss.pgh.pa.us> was to convert CTEs
> into plain subqueries during the prepjointree phase, either just before
> or as part of the pull_up_subqueries pass (since you'd want the converted
> subquery to be flattened if possible).

After looking at the code a bit, IMO the most reasonable thing to do is to
include this transformation in inline_set_returning_functions(), perhaps
renaming it to something like inline_srfs_and_ctes().  You could invent
a separate function instead, but that would require an extra pass over
the rangetable, for no particular benefit that I can see; the separate
function would have to be called from *exactly* the same set of places
as inline_set_returning_functions(), anyway, or it would not work right
for multiple levels of query optimization.
        regards, tom lane



Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Kouhei Kaigai
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, August 27, 2015 9:03 AM
> To: Qingqing Zhou
> Cc: Kaigai Kouhei(海外 浩平); Greg Stark; PostgreSQL-development
> Subject: Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan
>
> Qingqing Zhou <zhouqq.postgres@gmail.com> writes:
> > Above two queries essentially the same, but the second one is a
> > non-optimal plan. The reason is that how my patch works: it put a
> > substitution in front of SS_process_ctes():
>
> >    /*
> >   * If there is a WITH list, process each WITH query and build an initplan
> > ! * SubPlan structure for it. Before we process ctes, try to subsitute with
> > ! * subqueries to benefits from global optimization.
> >   */
> >   if (parse->cteList)
> > + {
> > + substitute_ctes_with_subqueries(root);
> >   SS_process_ctes(root);
> > + }
>
> > AFAICS, the substitution only handles cteList within a query block, so
> > it does not go across the subquery boundary. I can see this is an
> > issue but can't see a nice way to fix it. Anybody has some recipe?
>
> It seems like you're doing this in fundamentally the wrong place.
>
> What I had in mind in <38448.1430519406@sss.pgh.pa.us> was to convert CTEs
> into plain subqueries during the prepjointree phase, either just before
> or as part of the pull_up_subqueries pass (since you'd want the converted
> subquery to be flattened if possible).  If you do it later than that,
> then you'll have to reinvent a whole bunch of wheels to provide behavior
> similar to regular subquery optimization.
>
Hmm... My suggestion might not be reasonable. Sorry.

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Qingqing Zhou
Дата:
On Wed, Aug 26, 2015 at 5:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> After looking at the code a bit, IMO the most reasonable thing to do is to
> include this transformation in inline_set_returning_functions(), perhaps
> renaming it to something like inline_srfs_and_ctes().
>

This is essentially the same as my current implementation (revised
patch attached):
1. There are two call sites of inline_set_returning_functions(), and
one place is guarded with Assert(subquery->cteList == NIL). This means
transformation in subquery_planner() is effective.
2. A problem with revised patch is that we can't get rid of non-used
CTEs show up in EXPLAIN.

IMHO, here the problem is not "multiple levels" but "multiple
references". "levels" is handled well by recursion but references are
not: set returning function seems does not have the this issue because
you don't define a function along the query.

Regards,
Qingqing

---

Two testing queries results with revised patch:
1. Extra CTE q and p prints in EXPLAIN:
postgres=# explain WITH q AS (
postgres(#     WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1
on p.i>=p1.i)
postgres-# SELECT * FROM q WHERE i <= 5;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Nested Loop  (cost=1443.59..7423.16 rows=133333 width=8)
   CTE q
     ->  Nested Loop  (cost=1443.29..91700762.00 rows=3333333333 width=8)
           CTE p
             ->  Seq Scan on a a_2  (cost=0.00..1443.00 rows=100000 width=8)
           ->  Seq Scan on a a_3  (cost=0.00..1443.00 rows=100000 width=8)
           ->  Index Only Scan using ai on a a_4  (cost=0.29..583.65
rows=33333 width=4)
                 Index Cond: (i <= a_3.i)
   CTE p
     ->  Seq Scan on a a_5  (cost=0.00..1443.00 rows=100000 width=8)
   ->  Index Scan using ai on a  (cost=0.29..8.36 rows=4 width=8)
         Index Cond: (i <= 5)
   ->  Index Only Scan using ai on a a_1  (cost=0.29..1159.62
rows=33333 width=4)
         Index Cond: (i <= a.i)
(14 rows)

2. Extra m1 show up and same problem still there:
postgres=# explain WITH q AS (
postgres(#     WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on
postgres(# p.i>=p1.i), m as (select * from q), m1 as (select * from m)
postgres-# SELECT * FROM m1 WHERE i <= 5;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 CTE Scan on m  (cost=225034095.32..300034095.31 rows=1111111111 width=8)
   Filter: (i <= 5)
   CTE q
     ->  Nested Loop  (cost=1443.29..91700762.00 rows=3333333333 width=8)
           CTE p
             ->  Seq Scan on a  (cost=0.00..1443.00 rows=100000 width=8)
           ->  Seq Scan on a a_1  (cost=0.00..1443.00 rows=100000 width=8)
           ->  Index Only Scan using ai on a a_2  (cost=0.29..583.65
rows=33333 width=4)
                 Index Cond: (i <= a_1.i)
   CTE m
     ->  CTE Scan on q  (cost=0.00..66666666.66 rows=3333333333 width=8)
   CTE m1
     ->  CTE Scan on m m_1  (cost=0.00..66666666.66 rows=3333333333 width=8)
(13 rows)

Вложения

Re: Our trial to TPC-DS but optimizer made unreasonable plan

От
Qingqing Zhou
Дата:
On Thu, Aug 27, 2015 at 1:01 PM, Qingqing Zhou
<zhouqq.postgres@gmail.com> wrote:
> On Wed, Aug 26, 2015 at 5:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> After looking at the code a bit, IMO the most reasonable thing to do is to
>> include this transformation in inline_set_returning_functions(), perhaps
>> renaming it to something like inline_srfs_and_ctes().
>>
>
> This is essentially the same as my current implementation (revised
> patch attached):
>

I tried the method as Tom suggested (attached in previous email) but
still got the same issue - anybody see what I did wrong here? :-(

Thanks,
Qingqing