Re: Todo: Teach planner to evaluate multiple windows in the optimal order

Поиск
Список
Период
Сортировка
От Ankit Kumar Pandey
Тема Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Дата
Msg-id 8d0b24ce-f8ba-cd63-82ae-be853b4067e8@gmail.com
обсуждение исходный текст
Ответ на Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
> On 08/01/23 16:33, David Rowley wrote:

> On Sun, 8 Jan 2023 at 23:21, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
>> Please find attached patch with addressed issues mentioned before.
>
> Here's a quick review:
>
> 1. You can use foreach_current_index(l) to obtain the index of the list element.
> 
> 2. I'd rather see you looping backwards over the list in the first
> loop. I think it'll be more efficient to loop backwards as you can
> just break out the loop when you see the pathkeys are not contained in
> the order by pathkreys.  When the optimisation does not apply that
> means you only need to look at the last item in the list.  You could
> maybe just invent foreach_reverse() for this purpose and put it in
> pg_list.h.  That'll save having to manually code up the loop.
>
> 3. I don't think you should call the variable
> enable_order_by_pushdown. We've a bunch of planner related GUCs that
> start with enable_. That might cause a bit of confusion.  Maybe just
> try_sort_pushdown.
>
> 4. You should widen the scope of orderby_pathkeys and set it within
> the if (enable_order_by_pushdown) scope. You can reuse this again in
> the 2nd loop too.  Just initialise it to NIL
>
> 5. You don't seem to be checking all WindowClauses for a runCondtion.
> If you do #2 above then you can start that process in the initial
> reverse loop so that you've checked them all by the time you get
> around to that WindowClause again in the 2nd loop.
>
> 6. The test with "+-- Do not perform additional sort if column is
> presorted". I don't think "additional" is the correct word here.  I
> think you want to ensure that we don't push down the ORDER BY below
> the WindowAgg for this case. There is no ability to reduce the sorts
> here, only move them around, which we agreed we don't want to do for
> this case.


I have addressed all points 1-6 in the attached patch.

I have one doubt regarding runCondition, do we only need to ensure

that window function which has subset sort clause of main query should

not have runCondition or none of the window functions should not contain

runCondition? I have gone with later case but wanted to clarify.


> Also, do you want to have a go at coding up the sort bound pushdown
> too?  It'll require removing the limitCount restriction and adjusting
> ExecSetTupleBound() to recurse through a WindowAggState. I think it's
> pretty easy. You could try it then play around with it to make sure it
> works and we get the expected performance.

I tried this in the patch but kept getting `retrieved too many tuples in 
a bounded sort`.

Added following code in ExecSetTupleBound which correctly found sortstate

and set bound value.

    else if(IsA(child_node, WindowAggState))

    {

        WindowAggState  *winstate = (WindowAggState *) child_node;

        if (outerPlanState(winstate))

            ExecSetTupleBound(tuples_needed, outerPlanState(winstate));

    }

I think problem is that are not using limit clause inside window 
function (which

may need to scan all tuples) so passing bound value to 
WindowAggState->sortstate

is not working as we might expect. Or maybe I am getting it wrong? I was 
trying to

have top-N sort for limit clause if orderby pushdown happens.

> You'll likely want to add a few more rows than the last performance test you did or run the query
> with pgbench. Running a query once that only takes 1-2ms is likely not
> a reliable way to test the performance of something.

I did some profiling.

CREATE TABLE empsalary1 (
     depname varchar,
     empno bigint,
     salary int,
     enroll_date date
);
INSERT INTO empsalary1(depname, empno, salary, enroll_date)
SELECT string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() *
62)::integer,1000), '')
 
  AS depname, generate_series(1, 10000000) AS empno, ceil (random()*10000)::integer AS salary
, NOW() + (random() * (interval '90 days')) + '30 days' AS enroll_date;

1) Optimization case

EXPLAIN (ANALYZE)
SELECT empno,
        depname,
        min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
        sum(salary) OVER (PARTITION BY depname) depsalary,
        count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary1
ORDER BY depname, empno, enroll_date;

EXPLAIN (ANALYZE)
SELECT empno,
        depname,
        min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary1
ORDER BY depname, empno;



In patched version:

                                                                    QUERY PLAN
                        

--------------------------------------------------------------------------------------------------------------------------
-----------------------
  WindowAgg  (cost=93458.04..93458.08 rows=1 width=61) (actual time=149996.006..156756.995 rows=10000000 loops=1)
    ->  WindowAgg  (cost=93458.04..93458.06 rows=1 width=57) (actual time=108559.126..135892.188 rows=10000000
loops=1)
          ->  Sort  (cost=93458.04..93458.04 rows=1 width=53) (actual time=108554.213..112564.168 rows=10000000
loops=1)
                Sort Key: depname, empno, enroll_date
                Sort Method: external merge  Disk: 645856kB
                ->  WindowAgg  (cost=93458.01..93458.03 rows=1 width=53) (actual time=30386.551..62357.669
rows=10000000lo
 
ops=1)
                      ->  Sort  (cost=93458.01..93458.01 rows=1 width=45) (actual time=23260.104..26313.395
rows=10000000l
 
oops=1)
                            Sort Key: enroll_date DESC
                            Sort Method: external merge  Disk: 528448kB
                            ->  Seq Scan on empsalary1  (cost=0.00..93458.00 rows=1 width=45) (actual
time=0.032..4833.603
rows=10000000 loops=1)
  Planning Time: 4.693 ms
  Execution Time: 158161.281 ms

                                                               QUERY PLAN
              

--------------------------------------------------------------------------------------------------------------------------
-------------
  WindowAgg  (cost=1903015.63..2078015.74 rows=10000006 width=39) (actual time=40565.305..46598.984 rows=10000000
loops=1)
    ->  Sort  (cost=1903015.63..1928015.65 rows=10000006 width=39) (actual time=23411.837..27467.962 rows=10000000
loops=1)
          Sort Key: depname, empno
          Sort Method: external merge  Disk: 528448kB
          ->  Seq Scan on empsalary1  (cost=0.00..193458.06 rows=10000006 width=39) (actual time=5.095..5751.675
rows=10000
000 loops=1)
  Planning Time: 0.099 ms
  Execution Time: 47415.926 ms


In master:

                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------
-------------------------------------
  Incremental Sort  (cost=3992645.36..4792645.79 rows=10000006 width=59) (actual time=147130.132..160985.373
rows=10000000
loops=1)
    Sort Key: depname, empno, enroll_date
    Presorted Key: depname, empno
    Full-sort Groups: 312500  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
    ->  WindowAgg  (cost=3992645.31..4342645.52 rows=10000006 width=59) (actual time=147129.936..154023.147
rows=10000000l
 
oops=1)
          ->  WindowAgg  (cost=3992645.31..4192645.43 rows=10000006 width=55) (actual time=104665.289..133089.188
rows=1000
0000 loops=1)
                ->  Sort  (cost=3992645.31..4017645.33 rows=10000006 width=51) (actual time=104665.257..108710.282
rows=100
00000 loops=1)
                      Sort Key: depname, empno
                      Sort Method: external merge  Disk: 645856kB
                      ->  WindowAgg  (cost=1971370.63..2146370.74 rows=10000006 width=51) (actual
time=28314.300..59737.949
  rows=10000000 loops=1)
                            ->  Sort  (cost=1971370.63..1996370.65 rows=10000006 width=43) (actual
time=21190.188..24098.59
6 rows=10000000 loops=1)
                                  Sort Key: enroll_date DESC
                                  Sort Method: external merge  Disk: 528448kB
                                  ->  Seq Scan on empsalary1  (cost=0.00..193458.06 rows=10000006 width=43) (actual
time=0.
630..5317.862 rows=10000000 loops=1)
  Planning Time: 0.982 ms
  Execution Time: 163369.242 ms
(16 rows)

                                                                  QUERY PLAN
                    

--------------------------------------------------------------------------------------------------------------------------
-------------------
  Incremental Sort  (cost=3787573.31..3912573.41 rows=10000006 width=39) (actual time=51547.195..53950.034
rows=10000000lo
 
ops=1)
    Sort Key: depname, empno
    Presorted Key: depname
    Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB
    Pre-sorted Groups: 1  Sort Method: external merge  Average Disk: 489328kB  Peak Disk: 489328kB
    ->  WindowAgg  (cost=1903015.63..2078015.74 rows=10000006 width=39) (actual time=33413.954..39771.262 rows=10000000
loo
ps=1)
          ->  Sort  (cost=1903015.63..1928015.65 rows=10000006 width=39) (actual time=18991.129..21992.353
rows=10000000lo
 
ops=1)
                Sort Key: depname
                Sort Method: external merge  Disk: 528456kB
                ->  Seq Scan on empsalary1  (cost=0.00..193458.06 rows=10000006 width=39) (actual time=1.300..5269.729
rows
=10000000 loops=1)
  Planning Time: 4.506 ms
  Execution Time: 54768.697 ms


2) No optimization case:

EXPLAIN (ANALYZE)
SELECT empno,
        depname,
        min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
FROM empsalary1
ORDER BY enroll_date;

Patch:

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------
--------------------------------
  Sort  (cost=3968191.79..3993191.80 rows=10000006 width=43) (actual 
time=57863.173..60976.324 rows=10000000 loops=1)
    Sort Key: enroll_date
    Sort Method: external merge  Disk: 528448kB
    ->  WindowAgg  (cost=850613.62..2190279.21 rows=10000006 width=43) 
(actual time=7478.966..42502.541 rows=10000000 loops
=1)
          ->  Gather Merge  (cost=850613.62..2015279.11 rows=10000006 
width=43) (actual time=7478.935..18037.001 rows=10000
000 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Sort  (cost=849613.60..860030.27 rows=4166669 
width=43) (actual time=7349.101..9397.713 rows=3333333 lo
ops=3)
                      Sort Key: depname, empno
                      Sort Method: external merge  Disk: 181544kB
                      Worker 0:  Sort Method: external merge  Disk: 169328kB
                      Worker 1:  Sort Method: external merge  Disk: 177752kB
                      ->  Parallel Seq Scan on empsalary1 
(cost=0.00..135124.69 rows=4166669 width=43) (actual time=0.213.
.2450.635 rows=3333333 loops=3)
  Planning Time: 0.100 ms
  Execution Time: 63341.783 ms

master:

QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------
--------------------------------
  Sort  (cost=3968191.79..3993191.80 rows=10000006 width=43) (actual 
time=54097.880..57000.806 rows=10000000 loops=1)
    Sort Key: enroll_date
    Sort Method: external merge  Disk: 528448kB
    ->  WindowAgg  (cost=850613.62..2190279.21 rows=10000006 width=43) 
(actual time=7075.245..39200.756 rows=10000000 loops
=1)
          ->  Gather Merge  (cost=850613.62..2015279.11 rows=10000006 
width=43) (actual time=7075.217..15988.922 rows=10000
000 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                ->  Sort  (cost=849613.60..860030.27 rows=4166669 
width=43) (actual time=6993.974..8799.701 rows=3333333 lo
ops=3)
                      Sort Key: depname, empno
                      Sort Method: external merge  Disk: 171904kB
                      Worker 0:  Sort Method: external merge  Disk: 178496kB
                      Worker 1:  Sort Method: external merge  Disk: 178224kB
                      ->  Parallel Seq Scan on empsalary1 
(cost=0.00..135124.69 rows=4166669 width=43) (actual time=0.044.
.2683.598 rows=3333333 loops=3)
  Planning Time: 5.718 ms
  Execution Time: 59188.469 ms
(15 rows)

Master and patch have same performance as plan is same.


pgbench (this is to find average performance):

create table empsalary2 as select * from empsalary1 limit 1000;
-------------------------------------------------------------------
test.sql
SELECT empno,
        depname,
        min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
        sum(salary) OVER (PARTITION BY depname) depsalary,
        count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary2
ORDER BY depname, empno, enroll_date;

SELECT empno,
        depname,
        min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary2
ORDER BY depname, empno;
----------------------------------------------------------------------

/usr/local/pgsql/bin/pgbench -d test -c 10 -j 4 -t 1000 -f test.sql

Patch:

transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 55.262 ms
initial connection time = 8.480 ms
tps = 180.957685 (without initial connection time)


Master:

transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 60.489 ms
initial connection time = 7.069 ms
tps = 165.320205 (without initial connection time)


TPS of patched version is higher than that of master's for same set of 
queries.

where optimization is performed.

-- 
Regards,
Ankit Kumar Pandey

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ankit Kumar Pandey
Дата:
Сообщение: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Следующее
От: Mason Sharp
Дата:
Сообщение: Re: POC: Lock updated tuples in tuple_update() and tuple_delete()