Re: wip: functions median and percentile
От | Dean Rasheed |
---|---|
Тема | Re: wip: functions median and percentile |
Дата | |
Msg-id | AANLkTin7kZ1mA271=L4pemQFt2F6HRDROnkrshOZfJsd@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: wip: functions median and percentile (Hitoshi Harada <umi.tanuki@gmail.com>) |
Ответы |
Re: wip: functions median and percentile
Re: wip: functions median and percentile |
Список | pgsql-hackers |
On 4 October 2010 07:36, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/10/4 Greg Stark <gsstark@mit.edu>: >> On Sun, Oct 3, 2010 at 7:06 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >>> And I'm now thinking about how to make median happen in window >>> aggregate. >> >> If you were to do this by extending tuplesort what extra features >> would tuplesort need? > > I don't think we need to extend tuplesort when we do it. IMO this can > make happen by fetching all the values in the frame and performing > sort once, then getting current position and calculate median. The > only thing I consider is to mark and to go to the demanded position in > tuplesort like tuplestore, but actually we can manage to median() > without such facilities. > I dont' think that works, because the ORDER BY in the WINDOW doesn't necessarily match the sort order that the median function needs. Consider for example: select a, median(a) over(order by a desc) from generate_series(1,10) as g(a);a | median ----+--------------------10 | 10 9 | 9.5000000000000000 8 | 9 7 | 8.5000000000000000 6 | 8 5 | 7.5000000000000000 4 | 7 3 | 6.5000000000000000 2 | 6 1 | 5.5000000000000000 (10 rows) That requires a new sort for each row. I generated this with a minor tweak to Pavel's patch to just restart the tuplesort each time (the "quick-fix" solution). The problem is that performance really sucks, because it is an O(n^2 log(n)) algorithm. I don't see an easy way around that without significant new infrastructure, as Greg describes, or a completely different algorithm. >> How do existing windowed aggregates work if you specify an order by on >> the aggregate? Do they resort for every output row? > Isn't the issue that they don't need to sort, so they all work unchanged. Regards, Dean > I don't know the true specification of median() nor the behavior of it > in other RDBs, but I bet median is median and ORDER BY in window > clause doesn't affect its result. ORDER BY specifies only the frame > boundary. > > Regards, > > > > -- > Hitoshi Harada > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
В списке pgsql-hackers по дате отправления: