Re: Todo: Teach planner to evaluate multiple windows in the optimal order

Поиск
Список
Период
Сортировка
От	Ankit Kumar Pandey
Тема	Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Дата	15 января 2023 г. 17:52:47
Msg-id	4303947c-c313-ec43-d442-80c4c8e92490@gmail.com обсуждение исходный текст
Ответ на	Re: Todo: Teach planner to evaluate multiple windows in the optimal order (David Rowley <dgrowleyml@gmail.com>)
Ответы	Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Список	pgsql-hackers
Дерево обсуждения
> On 13/01/23 07:48, David Rowley wrote:

> It would just care if the
> pathkeys were present and return a list of pathkeys not contained so
> that an incremental sort could be done only on the returned list and a
> Unique on an empty returned list. 

In create_final_distinct_paths, presorted keys are determined from

input_rel->pathlist & needed_pathkeys. Problem with input_rel->pathlist

is that, for index node, useful_pathkeys is stored in 
input_rel->pathlist but this useful_pathkeys

is determined from truncate_useless_pathkeys(index_pathkeys) which 
removes index_keys if ordering is different.

Hence, input_rel->pathlist returns null for select distinct b,a from ab 
where a < 10; even if index is created on a.

In order to tackle this, I have added index_pathkeys in indexpath node 
itself.

Although I started this patch from master, I merged changes to window sort

optimizations.


In patched version:

set enable_hashagg=0;
set enable_seqscan=0;

explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------
  Unique  (cost=10000000039.49..10000000063.73 rows=415 width=73)
    ->  Incremental Sort  (cost=10000000039.49..10000000060.62 rows=415 width=73)
          Sort Key: relkind, relname, (count(*) OVER (?))
          Presorted Key: relkind
          ->  WindowAgg  (cost=10000000034.20..10000000041.46 rows=415 width=73)
                ->  Sort  (cost=10000000034.20..10000000035.23 rows=415 width=65)
                      Sort Key: relkind
                      ->  Seq Scan on pg_class  (cost=10000000000.00..10000000016.15 rows=415 width=65)
(8 rows)

explain select distinct b,a from ab where a < 10;
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Unique  (cost=0.71..60.05 rows=611 width=8)
    ->  Incremental Sort  (cost=0.71..55.55 rows=900 width=8)
          Sort Key: a, b
          Presorted Key: a
          ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04 rows=900 width=8)
                Index Cond: (a < 10)
(6 rows)

explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b;
                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------
  Unique  (cost=10000021174.63..10000038095.75 rows=60 width=16)
    ->  Incremental Sort  (cost=10000021174.63..10000036745.75 rows=180000 width=16)
          Sort Key: a, b, (count(*) OVER (?))
          Presorted Key: a, b
          ->  WindowAgg  (cost=10000020948.87..10000024098.87 rows=180000 width=16)
                ->  Sort  (cost=10000020948.87..10000021398.87 rows=180000 width=8)
                      Sort Key: a, b
                      ->  Seq Scan on abcd  (cost=10000000000.00..10000002773.00 rows=180000 width=8)
(8 rows)

explain select distinct a, b, count(*) over (partition by a,b,c) from abcd;
                                               QUERY PLAN
------------------------------------------------------------------------------------------------------
  Unique  (cost=10000021580.47..10000036629.31 rows=60 width=20)
    ->  Incremental Sort  (cost=10000021580.47..10000035279.31 rows=180000 width=20)
          Sort Key: a, b, c, (count(*) OVER (?))
          Presorted Key: a, b, c
          ->  WindowAgg  (cost=10000021561.37..10000025611.37 rows=180000 width=20)
                ->  Sort  (cost=10000021561.37..10000022011.37 rows=180000 width=12)
                      Sort Key: a, b, c
                      ->  Seq Scan on abcd  (cost=10000000000.00..10000002773.00 rows=180000 width=12)
(8 rows)

explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;

                                             QUERY PLAN

---------------------------------------------------------------------------------------------------

  Unique  (cost=2041.88..36764.90 rows=60 width=20)

    ->  Incremental Sort  (cost=2041.88..35414.90 rows=180000 width=20)

          Sort Key: b, a, c, (count(*) OVER (?))

          Presorted Key: b, a, c

          ->  WindowAgg  (cost=1989.94..25746.96 rows=180000 width=20)

                ->  Incremental Sort  (cost=1989.94..22146.96 rows=180000 width=12)

                      Sort Key: b, a, c

                      Presorted Key: b

                      ->  Index Scan using b_idx on abcd  (cost=0.29..7174.62 rows=180000 width=12)

(9 rows)


In master:

explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;

                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------
  Unique  (cost=10000000059.50..10000000063.65 rows=415 width=73)
    ->  Sort  (cost=10000000059.50..10000000060.54 rows=415 width=73)
          Sort Key: relname, relkind, (count(*) OVER (?))
          ->  WindowAgg  (cost=10000000034.20..10000000041.46 rows=415 width=73)
                ->  Sort  (cost=10000000034.20..10000000035.23 rows=415 width=65)
                      Sort Key: relkind
                      ->  Seq Scan on pg_class  (cost=10000000000.00..10000000016.15 rows=415 width=65)
(7 rows)

explain select distinct b,a from ab where a < 10;

                                     QUERY PLAN
----------------------------------------------------------------------------------
  Unique  (cost=72.20..78.95 rows=611 width=8)
    ->  Sort  (cost=72.20..74.45 rows=900 width=8)
          Sort Key: b, a
          ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04 rows=900 width=8)
                Index Cond: (a < 10)
(5 rows)

explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b;

                                              QUERY PLAN

-----------------------------------------------------------------------------------------------------

  Unique  (cost=10000023704.77..10000041084.40 rows=60 width=16)

    ->  Incremental Sort  (cost=10000023704.77..10000039734.40 rows=180000 width=16)

          Sort Key: a, b, (count(*) OVER (?))

          Presorted Key: a

          ->  WindowAgg  (cost=10000020948.87..10000024098.87 rows=180000 width=16)

                ->  Sort  (cost=10000020948.87..10000021398.87 rows=180000 width=8)

                      Sort Key: a

                      ->  Seq Scan on abcd  (cost=10000000000.00..10000002773.00 rows=180000 width=8)

(8 rows)
explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;
                                             QUERY PLAN
---------------------------------------------------------------------------------------------------
  Unique  (cost=45151.33..46951.33 rows=60 width=20)
    ->  Sort  (cost=45151.33..45601.33 rows=180000 width=20)
          Sort Key: a, b, (count(*) OVER (?))
          ->  WindowAgg  (cost=1989.94..25746.96 rows=180000 width=20)
                ->  Incremental Sort  (cost=1989.94..22146.96 rows=180000 width=12)
                      Sort Key: b, a, c
                      Presorted Key: b
                      ->  Index Scan using b_idx on abcd  (cost=0.29..7174.62 rows=180000 width=12)
(8 rows)


Note: Composite keys are also handled.

create index xy_idx on xyz(x,y);

explain select distinct x,z,y from xyz;
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Unique  (cost=0.86..55.97 rows=60 width=12)
    ->  Incremental Sort  (cost=0.86..51.47 rows=600 width=12)
          Sort Key: x, y, z
          Presorted Key: x, y
          ->  Index Scan using xy_idx on xyz  (cost=0.15..32.80 rows=600 width=12)
(5 rows)


There are some cases where different kind of scan happens

explain select distinct x,y from xyz where y < 10;
                                     QUERY PLAN
-----------------------------------------------------------------------------------
  Unique  (cost=47.59..51.64 rows=60 width=8)
    ->  Sort  (cost=47.59..48.94 rows=540 width=8)
          Sort Key: x, y
          ->  Bitmap Heap Scan on xyz  (cost=12.34..23.09 rows=540 width=8)
                Recheck Cond: (y < 10)
                ->  Bitmap Index Scan on y_idx (cost=0.00..12.20 
rows=540 width=0)
                      Index Cond: (y < 10)
(7 rows)

As code only checks from IndexPath (at the moment), other scan paths are 
not covered.

Is it okay to cover these in same way as I did for IndexPath? (with no 
limitation on this behaviour on certain path types?)

Also, I am assuming distinct pathkeys can be changed without any issues. 
As changes are limited to modification in distinct path only,

I don't see this affecting other nodes. Test cases are green,

with a couple of failures in window functions (one which I had added) 
and one very weird:

EXPLAIN (COSTS OFF)
SELECT DISTINCT
        empno,
        enroll_date,
        depname,
        sum(salary) OVER (PARTITION BY depname order by empno) depsalary,
        min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary
FROM empsalary
ORDER BY depname, empno;
                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------
  Incremental Sort
    Sort Key: depname, empno
    Presorted Key: depname
    ->  Unique
          ->  Incremental Sort
                Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?))
                Presorted Key: depname
                ->  WindowAgg
                      ->  Incremental Sort
                            Sort Key: depname, empno
                            Presorted Key: depname
                            ->  WindowAgg
                                  ->  Sort
                                        Sort Key: depname, enroll_date
                                        ->  Seq Scan on empsalary
(15 rows)

In above query plan, unique used to come after Incremental sort in the 
master.


Pending:

1. Consider if Pathkey.pk_strategy and pk_nulls_first need to be 
compared too, this is pending

as I have to look these up and understand them.

2. Test cases (failures and new cases)

3. Improve comments


Patch v8 attached.

Please let me know any review comments, will address these in followup patch

with pending items.


Thanks,

Ankit
Вложения

better_windowclause_sorting_dgr-v8.patch
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

Вложения