Re: POC: GROUP BY optimization
От | jian he |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | CACJufxFZ8vLd6_M0krR8xG8vBNKGSYiLXPMpmK1c60wXzbsTJA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (jian he <jian.universality@gmail.com>) |
Список | pgsql-hackers |
On Fri, Apr 19, 2024 at 6:44 PM jian he <jian.universality@gmail.com> wrote: > > On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > > Thank you for the fixes you've proposed. I didn't look much into > > details yet, but I think the main concern Tom expressed in [1] is > > whether the feature is reasonable at all. I think at this stage the > > most important thing is to come up with convincing examples showing > > how huge performance benefits it could cause. I will return to this > > later today and will try to provide some convincing examples. > > hi. previously preprocess_groupclause will not process cases where no ORDER BY clause is specified. commit 0452b461b will reorder the GROUP BY element even though no ORDER BY clause is specified , if there are associated indexes on it. (hope I understand it correctly). for example (when enable_hashagg is false) explain(verbose) select count(*) FROM btg GROUP BY y,x; in pg16 will not reorder, it will be as is: `GROUP BY y,x` after commit 0452b461b, it will reorder to `GROUP BY x,y`. because there is an index `btree (x, y)` (only one) associated with it. if you drop the index `btree (x, y)` , it will be `GROUP BY y,x` as pg16. This reordering GROUP BY element when no ORDER BY clause is not specified is performant useful when the work_mem is small. I've attached some tests comparing master with REL_16_STABLE to demonstrate that. all the tests attached are under the condition: work_mem='64kB', buildtype=release, max_parallel_workers_per_gather=0. one example: CREATE TABLE btg5 AS SELECT i::numeric % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i % 100000 AS w FROM generate_series(1, 1e6) AS i; CREATE INDEX btg5_x_y_idx ON btg5(x, y); explain(analyze) SELECT count(*) FROM btg5 GROUP BY z, y, w, x; in pg17, the execution time is: 746.574 ms in pg16, the execution time is: 1693.483 ms if I reorder it manually as: `explain(analyze) SELECT count(*) FROM btg5 GROUP BY x, y, w, z;` then in pg16, the execution time is 630.394 ms
Вложения
В списке pgsql-hackers по дате отправления: