Re: [PoC] Implementation of distinct in Window Aggregates
От | Ankit Kumar Pandey |
---|---|
Тема | Re: [PoC] Implementation of distinct in Window Aggregates |
Дата | |
Msg-id | 770824eb-a01e-354c-2434-7fed1dd832a4@gmail.com обсуждение исходный текст |
Ответ на | Re: [PoC] Implementation of distinct in Window Aggregates (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Ответы |
Re: [PoC] Implementation of distinct in Window Aggregates
|
Список | pgsql-hackers |
On 04/01/23 18:10, Ankit Kumar Pandey wrote: > On 29/12/22 20:58, Ankit Kumar Pandey wrote: > > > > On 24/12/22 18:22, Ankit Pandey wrote: > >> Hi, > >> > >> This is a PoC patch which implements distinct operation in window > >> aggregates (without order by and for single column aggregation, final > >> version may vary wrt these limitations). Purpose of this PoC is to > >> get feedback on the approach used and corresponding implementation, > >> any nitpicking as deemed reasonable. > >> > >> Distinct operation is mirrored from implementation in nodeAgg. > >> Existing partitioning logic determines if row is in partition and > >> when distinct is required, all tuples for the aggregate column are > >> stored in tuplesort. When finalize_windowaggregate gets called, > >> tuples are sorted and duplicates are removed, followed by calling the > >> transition function on each tuple. > >> When distinct is not required, the above process is skipped and the > >> transition function gets called directly and nothing gets inserted > >> into tuplesort. > >> Note: For each partition, in tuplesort_begin and tuplesort_end is > >> involved to rinse tuplesort, so at any time, max tuples in tuplesort > >> is equal to tuples in a particular partition. > >> > >> I have verified it for interger and interval column aggregates (to > >> rule out obvious issues related to data types). > >> > >> Sample cases: > >> > >> create table mytable(id int, name text); > >> insert into mytable values(1, 'A'); > >> insert into mytable values(1, 'A'); > >> insert into mytable values(5, 'B'); > >> insert into mytable values(3, 'A'); > >> insert into mytable values(1, 'A'); > >> > >> select avg(distinct id) over (partition by name) from mytable; > >> avg > >> -------------------- > >> 2.0000000000000000 > >> 2.0000000000000000 > >> 2.0000000000000000 > >> 2.0000000000000000 > >> 5.0000000000000000 > >> > >> select avg(id) over (partition by name) from mytable; > >> avg > >> -------------------- > >> 1.5000000000000000 > >> 1.5000000000000000 > >> 1.5000000000000000 > >> 1.5000000000000000 > >> 5.0000000000000000 > >> > >> select avg(distinct id) over () from mytable; > >> avg > >> -------------------- > >> 3.0000000000000000 > >> 3.0000000000000000 > >> 3.0000000000000000 > >> 3.0000000000000000 > >> 3.0000000000000000 > >> > >> select avg(distinct id) from mytable; > >> avg > >> -------------------- > >> 3.0000000000000000 > >> > >> This is my first-time contribution. Please let me know if anything > >> can be > >> improved as I`m eager to learn. > >> > >> Regards, > >> Ankit Kumar Pandey > > > > Hi all, > > > > I know everyone is busy with holidays (well, Happy Holidays!) but I > > will be glad if someone can take a quick look at this PoC and share > > thoughts. > > > > This is my first time contribution so I am pretty sure there will be > > some very obvious feedbacks (which will help me to move forward with > > this change). > > > > > Updated patch with latest master. Last patch was an year old. > Attaching patch with rebase from latest HEAD Thanks, Ankit
Вложения
В списке pgsql-hackers по дате отправления: