Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys
Дата
Msg-id CAMbWs49jxZkfSP-9rG0Go4W7=pXCgS7jcjYJEERGJF3SoOFAqg@mail.gmail.com
обсуждение исходный текст
Ответ на Remove WindowClause PARTITION BY items belonging to redundant pathkeys  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers

On Thu, Jun 8, 2023 at 7:37 AM David Rowley <dgrowleyml@gmail.com> wrote:
What the attached patch does is process each WindowClause and removes
any items from the PARTITION BY clause that are columns or expressions
relating to redundant PathKeys.

Effectively, this allows the nodeWindowAgg.c code which stops
processing WindowAgg rows when the run condition is met to work as the
PARTITION BY clause is completely removed in the case of the above
query.  Removing the redundant PARTITION BY items also has the added
benefit of not having to needlessly check if the next row belongs to
the same partition as the last row. For the above, that check is a
waste of time as all rows have relkind = 'r'

This is a nice optimization.  I reviewed it and here are my findings.

In create_windowagg_plan there is such comment that says

* ... Note: in principle, it's possible
* to drop some of the sort columns, if they were proved redundant by
* pathkey logic.  However, it doesn't seem worth going out of our way to
* optimize such cases.

Since this patch removes any clauses from the wc->partitionClause for
redundant pathkeys, this comment seems outdated, at least for the sort
columns in partitionClause.

Also I'm wondering if we can do the same optimization to
wc->orderClause.  I tested it with the query below and saw performance
gains.

create table t (a int, b int);
insert into t select 1,2 from generate_series(1,100000)i;
analyze t;

explain analyze
select * from
 (select a, b, rank() over (PARTITION BY a order by b) rank
    from t where b = 2)
where a = 1 and rank <= 10;

With and without this optimization to wc->orderClause the execution time
is 67.279 ms VS. 119.120 ms (both best of 3).

I notice you comment in the patch that doing this is unsafe because it
would change the semantics of peer rows during execution.  Would you
please elaborate on that?

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Daniel Verite"
Дата:
Сообщение: Re: Order changes in PG16 since ICU introduction
Следующее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Do we want a hashset type?