Re: [PoC] Implementation of distinct in Window Aggregates
От | Ankit Kumar Pandey |
---|---|
Тема | Re: [PoC] Implementation of distinct in Window Aggregates |
Дата | |
Msg-id | 2ef6b491-1946-b606-f064-d9ea79d91463@gmail.com обсуждение исходный текст |
Ответ на | [PoC] Implementation of distinct in Window Aggregates (Ankit Pandey <itsankitkp@gmail.com>) |
Ответы |
Re: [PoC] Implementation of distinct in Window Aggregates
|
Список | pgsql-hackers |
On 24/12/22 18:22, Ankit Pandey wrote: > Hi, > > This is a PoC patch which implements distinct operation in window > aggregates (without order by and for single column aggregation, final > version may vary wrt these limitations). Purpose of this PoC is to get > feedback on the approach used and corresponding implementation, any > nitpicking as deemed reasonable. > > Distinct operation is mirrored from implementation in nodeAgg. > Existing partitioning logic determines if row is in partition and when > distinct is required, all tuples for the aggregate column are stored > in tuplesort. When finalize_windowaggregate gets called, tuples are > sorted and duplicates are removed, followed by calling the transition > function on each tuple. > When distinct is not required, the above process is skipped and the > transition function gets called directly and nothing gets inserted > into tuplesort. > Note: For each partition, in tuplesort_begin and tuplesort_end is > involved to rinse tuplesort, so at any time, max tuples in tuplesort > is equal to tuples in a particular partition. > > I have verified it for interger and interval column aggregates (to > rule out obvious issues related to data types). > > Sample cases: > > create table mytable(id int, name text); > insert into mytable values(1, 'A'); > insert into mytable values(1, 'A'); > insert into mytable values(5, 'B'); > insert into mytable values(3, 'A'); > insert into mytable values(1, 'A'); > > select avg(distinct id) over (partition by name) from mytable; > avg > -------------------- > 2.0000000000000000 > 2.0000000000000000 > 2.0000000000000000 > 2.0000000000000000 > 5.0000000000000000 > > select avg(id) over (partition by name) from mytable; > avg > -------------------- > 1.5000000000000000 > 1.5000000000000000 > 1.5000000000000000 > 1.5000000000000000 > 5.0000000000000000 > > select avg(distinct id) over () from mytable; > avg > -------------------- > 3.0000000000000000 > 3.0000000000000000 > 3.0000000000000000 > 3.0000000000000000 > 3.0000000000000000 > > select avg(distinct id) from mytable; > avg > -------------------- > 3.0000000000000000 > > This is my first-time contribution. Please let me know if anything can be > improved as I`m eager to learn. > > Regards, > Ankit Kumar Pandey Hi all, I know everyone is busy with holidays (well, Happy Holidays!) but I will be glad if someone can take a quick look at this PoC and share thoughts. This is my first time contribution so I am pretty sure there will be some very obvious feedbacks (which will help me to move forward with this change). -- Regards, Ankit Kumar Pandey
В списке pgsql-hackers по дате отправления: