Re: [PoC] Implementation of distinct in Window Aggregates
От | Ankit Kumar Pandey |
---|---|
Тема | Re: [PoC] Implementation of distinct in Window Aggregates |
Дата | |
Msg-id | 706f4939-b905-caf6-d6e1-49065e0f5c6a@gmail.com обсуждение исходный текст |
Ответ на | Re: [PoC] Implementation of distinct in Window Aggregates (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Ответы |
Re: [PoC] Implementation of distinct in Window Aggregates
|
Список | pgsql-hackers |
On 29/12/22 20:58, Ankit Kumar Pandey wrote: > > On 24/12/22 18:22, Ankit Pandey wrote: >> Hi, >> >> This is a PoC patch which implements distinct operation in window >> aggregates (without order by and for single column aggregation, final >> version may vary wrt these limitations). Purpose of this PoC is to >> get feedback on the approach used and corresponding implementation, >> any nitpicking as deemed reasonable. >> >> Distinct operation is mirrored from implementation in nodeAgg. >> Existing partitioning logic determines if row is in partition and >> when distinct is required, all tuples for the aggregate column are >> stored in tuplesort. When finalize_windowaggregate gets called, >> tuples are sorted and duplicates are removed, followed by calling the >> transition function on each tuple. >> When distinct is not required, the above process is skipped and the >> transition function gets called directly and nothing gets inserted >> into tuplesort. >> Note: For each partition, in tuplesort_begin and tuplesort_end is >> involved to rinse tuplesort, so at any time, max tuples in tuplesort >> is equal to tuples in a particular partition. >> >> I have verified it for interger and interval column aggregates (to >> rule out obvious issues related to data types). >> >> Sample cases: >> >> create table mytable(id int, name text); >> insert into mytable values(1, 'A'); >> insert into mytable values(1, 'A'); >> insert into mytable values(5, 'B'); >> insert into mytable values(3, 'A'); >> insert into mytable values(1, 'A'); >> >> select avg(distinct id) over (partition by name) from mytable; >> avg >> -------------------- >> 2.0000000000000000 >> 2.0000000000000000 >> 2.0000000000000000 >> 2.0000000000000000 >> 5.0000000000000000 >> >> select avg(id) over (partition by name) from mytable; >> avg >> -------------------- >> 1.5000000000000000 >> 1.5000000000000000 >> 1.5000000000000000 >> 1.5000000000000000 >> 5.0000000000000000 >> >> select avg(distinct id) over () from mytable; >> avg >> -------------------- >> 3.0000000000000000 >> 3.0000000000000000 >> 3.0000000000000000 >> 3.0000000000000000 >> 3.0000000000000000 >> >> select avg(distinct id) from mytable; >> avg >> -------------------- >> 3.0000000000000000 >> >> This is my first-time contribution. Please let me know if anything >> can be >> improved as I`m eager to learn. >> >> Regards, >> Ankit Kumar Pandey > > Hi all, > > I know everyone is busy with holidays (well, Happy Holidays!) but I > will be glad if someone can take a quick look at this PoC and share > thoughts. > > This is my first time contribution so I am pretty sure there will be > some very obvious feedbacks (which will help me to move forward with > this change). > > Updated patch with latest master. Last patch was an year old. -- Regards, Ankit Kumar Pandey
Вложения
В списке pgsql-hackers по дате отправления: