Обсуждение: avg,first,last,median in one query

Поиск
Список
Период
Сортировка

avg,first,last,median in one query

От
Konstantin Knizhnik
Дата:
Hi, hackers.

I need advice from SQL experts: is there any way in PostgreSQL to 
calculate avg,first,last,median aggregates in one query?
Assume that we have the following table:

create table Securities ("Symbol" varchar, "Date" date, "Time" time, 
"Price" real);

We can simulate median using percentile_disc:

select "Symbol","Date",    avg("Price"),    percentile_disc(0.5) within group (order by "Price")
from Securities
group by "Symbol","Date";

And all other aggregates can be calculated using windows functions:

select distinct "Symbol","Date",    first_value("Price") over (partition by "Symbol","Date" order by 
"Time" rows between unbounded preceding and unbounded following),    last_value("Price") over (partition by
"Symbol","Date"order by 
 
"Time" rows between unbounded preceding and unbounded following),    avg("Price") over (partition by "Symbol","Date"
rowsbetween 
 
unbounded preceding and unbounded following)
from Securities;

I wonder is there are any simpler/efficient alternative to the query above?

But unfortunately it is not possible to calculate median is such way 
because percentile_disc is not compatible with OVER:

ERROR: OVER is not supported for ordered-set aggregate percentile_disc

So is there any chance to calculate all this four aggregates in one 
query without writing some supplementary functions?

Additional question: what is the most efficient way of calculating 
MEDIAN in PostgreSQL?
I found three different approaches:

1. Using CTE:

https://www.periscopedata.com/blog/medians-in-sql.html

2. Using user-defined aggregate function which uses array_appendand so 
materialize all values in memory:

https://wiki.postgresql.org/wiki/Aggregate_Median

3. Using percentile aggregate:

http://blog.jooq.org/2015/01/06/how-to-emulate-the-median-aggregate-function-using-inverse-distribution-functions/


Thanks in advance,

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: avg,first,last,median in one query

От
Jim Nasby
Дата:
On 3/24/16 9:00 AM, Konstantin Knizhnik wrote:
> But unfortunately it is not possible to calculate median is such way
> because percentile_disc is not compatible with OVER:

I don't know if you could use cume_dist()[1] to do this, but even if you 
can't it probably wouldn't be hard to modify it to do what you need.

[1] http://www.postgresql.org/docs/9.5/static/functions-window.html
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com