Re: Todo: Teach planner to evaluate multiple windows in the optimal order
От | David Rowley |
---|---|
Тема | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Дата | |
Msg-id | CAApHDvowcPrNY3kgAPhbFL0nZ1fCvUBcp3y1AjLm0TUP5N3cxw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Ответы |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
(Ankit Kumar Pandey <itsankitkp@gmail.com>)
|
Список | pgsql-hackers |
On Wed, 11 Jan 2023 at 08:17, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > > > > On 10/01/23 10:53, David Rowley wrote: > > > the total cost is the same for both of these, but the execution time > > seems to vary quite a bit. > > This is really weird, I tried it different ways (to rule out any issues > due to > > caching) and execution time varied in spite of having same cost. > > > Maybe we should try and do this for DISTINCT queries if the > > distinct_pathkeys match the orderby_pathkeys. That seems a little less > > copout-ish. If the ORDER BY is the same as the DISTINCT then it seems > > likely that the ORDER BY might opt to use the Unique path for DISTINCT > > since it'll already have the correct pathkeys. > > > However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and > > then sort all over again, especially so when the DISTINCT removes a > > large proportion of the rows. > > Isn't order by pathkeys are always fewer than distinct pathkeys? Just thinking about this again, I remembered why I thought DISTINCT was uninteresting to start with. The problem is that if the query has WindowFuncs and also has a DISTINCT clause, then the WindowFunc results *must* be in the DISTINCT clause and, optionally also in the ORDER BY clause. There's no other place to write WindowFuncs IIRC. Since we cannot pushdown the sort when the more strict version of the pathkeys have WindowFuncs, then we must still perform the additional sort if the planner chooses to do a non-hashed DISTINCT. The aim of this patch is to reduce the total number of sorts, and I don't think that's possible in this case as you can't have WindowFuncs in the ORDER BY when they're not in the DISTINCT clause: postgres=# select distinct relname from pg_Class order by row_number() over (order by oid); ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: select distinct relname from pg_Class order by row_number() ... Another type of query which is suboptimal still is when there's a DISTINCT and WindowClause but no ORDER BY. We'll reorder the DISTINCT clause so that the leading columns of the ORDER BY come first in transformDistinctClause(), but we've nothing to do the same for WindowClauses. It can't happen around when transformDistinctClause() is called as we've yet to decide the WindowClause evaluation order, so if we were to try to make that better it would maybe have to do in the upper planner somewhere. It's possible it's too late by that time to adjust the DISTINCT clause. Here's an example of it. # set enable_hashagg=0; # explain select distinct relname,relkind,count(*) over (partition by relkind) from pg_Class; QUERY PLAN ------------------------------------------------------------------------------------ Unique (cost=61.12..65.24 rows=412 width=73) -> Sort (cost=61.12..62.15 rows=412 width=73) Sort Key: relname, relkind, (count(*) OVER (?)) -> WindowAgg (cost=36.01..43.22 rows=412 width=73) -> Sort (cost=36.01..37.04 rows=412 width=65) Sort Key: relkind -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=65) (7 rows) We can simulate the optimisation by swapping the column order in the targetlist. Note the planner can use Incremental Sort (at least since 3c6fc5820, from about 2 hours ago) # explain select distinct relkind,relname,count(*) over (partition by relkind) from pg_Class; QUERY PLAN ------------------------------------------------------------------------------------ Unique (cost=41.26..65.32 rows=412 width=73) -> Incremental Sort (cost=41.26..62.23 rows=412 width=73) Sort Key: relkind, relname, (count(*) OVER (?)) Presorted Key: relkind -> WindowAgg (cost=36.01..43.22 rows=412 width=73) -> Sort (cost=36.01..37.04 rows=412 width=65) Sort Key: relkind -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=65) (8 rows) Not sure if we should be trying to improve that in this patch. I just wanted to identify it as something else that perhaps could be done. I'm not really all that sure the above query shape makes much sense in the real world. Would anyone ever want to use DISTINCT on some results containing WindowFuncs? David
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter GeogheganДата:
Сообщение: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Следующее
От: David RowleyДата:
Сообщение: Re: Todo: Teach planner to evaluate multiple windows in the optimal order