[PoC] Implementation of distinct in Window Aggregates

Поиск

Список

Период

Сортировка

От	Ankit Pandey
Тема	[PoC] Implementation of distinct in Window Aggregates
Дата	24 декабря 2022 г. 12:52:03
Msg-id	CALbMxBxHnSxZH-Z7dES0gCK-D=54k2XRdtqL0d0x4eXF+65bZw@mail.gmail.com обсуждение исходный текст
Ответы	Re: [PoC] Implementation of distinct in Window Aggregates
Список	pgsql-hackers

Дерево обсуждения

Hi,

This is a PoC patch which implements distinct operation in window aggregates (without order by and for single column aggregation, final version may vary wrt these limitations). Purpose of this PoC is to get feedback on the approach used and corresponding implementation, any nitpicking as deemed reasonable.

Distinct operation is mirrored from implementation in nodeAgg. Existing partitioning logic determines if row is in partition and when distinct is required, all tuples for the aggregate column are stored in tuplesort. When finalize_windowaggregate gets called, tuples are sorted and duplicates are removed, followed by calling the transition function on each tuple.

When distinct is not required, the above process is skipped and the transition function gets called directly and nothing gets inserted into tuplesort.

Note: For each partition, in tuplesort_begin and tuplesort_end is involved to rinse tuplesort, so at any time, max tuples in tuplesort is equal to tuples in a particular partition.

I have verified it for interger and interval column aggregates (to rule out obvious issues related to data types).

Sample cases:

create table mytable(id int, name text);

insert into mytable values(1, 'A');

insert into mytable values(5, 'B');

insert into mytable values(3, 'A');

insert into mytable values(1, 'A');

select avg(distinct id) over (partition by name) from mytable;

avg
--------------------

2.0000000000000000

5.0000000000000000

select avg(id) over (partition by name) from mytable;

avg
--------------------
1.5000000000000000
1.5000000000000000
1.5000000000000000
1.5000000000000000
5.0000000000000000

select avg(distinct id) over () from mytable;
avg
--------------------
3.0000000000000000
3.0000000000000000
3.0000000000000000
3.0000000000000000
3.0000000000000000

select avg(distinct id) from mytable;
avg
--------------------
3.0000000000000000

This is my first-time contribution. Please let me know if anything can be

improved as I`m eager to learn.

Regards,

Ankit Kumar Pandey

Вложения

v1-0001-Implement-distinct-in-Window-Aggregates.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PoC] Implementation of distinct in Window Aggregates

Вложения