Обсуждение: Help on a complex query (avg data for day of the week)

Поиск
Список
Период
Сортировка

Help on a complex query (avg data for day of the week)

От
Matthew Smith
Дата:
Hello,

I have a table containing a timestamp and data usage fields (among others). 
This table stores amounts of data usage and the times then the data was used, 
eg:
time              | data
------------------------+----------2005-03-26 09:32:43+11 |      162

I want to form a query that returns the average total usage for each day of 
the week, eg:

day    |        avg_usage
--------+---------------------     0 |  35684624.000000000     1 | 103344529.000000000     2 | 105899406.000000000
3|  21994539.000000000     4 | 113045173.000000000     5 | 110675115.000000000     6 |  8791397.0000000000
 
(7 rows)

To get this info, I am using the following query:

select dow as day, sum(sum_data)/count(dow) as avg_usage from 
(select extract('dow' from date_trunc('day', time)) as dow, sum(data) as 
sum_data 
from datalog where time >= '2005-09-11' and time <= '2005-09-25' group by dow) 
as avg_data_per_day group by day;

This works well, assuming that there is at least one entry in the table for 
each day in the time period.

The problem comes when there are days where no data is logged. In my example, 
the total data for each day of the week is divided by the number of the days 
found. As there are exactly 2 of each day of the week between 2005-09-11 and 
2005-09-25, we should hope to divide each total by 2. but if there is no data 
logged for the 14th, then the total for wednesdays would be divided by 1. I 
want it to be the sum of the 2 days divided by 2: (101994539 + 0)/2.

Is there a better way to do this? Or does anyone have any suggestions on the 
best way to insert the missing dates into my query?

Any help would be great!

Thanks,

Matthew Smith



Re: Help on a complex query (avg data for day of the week)

От
Richard Huxton
Дата:
Matthew Smith wrote:
> I want to form a query that returns the average total usage for each day of 
> the week, eg:
[snip]
> To get this info, I am using the following query:
> 
> select dow as day, sum(sum_data)/count(dow) as avg_usage from 
> (select extract('dow' from date_trunc('day', time)) as dow, sum(data) as 
> sum_data 
> from datalog where time >= '2005-09-11' and time <= '2005-09-25' group by dow) 
> as avg_data_per_day group by day;
> 
> This works well, assuming that there is at least one entry in the table for 
> each day in the time period.
> 
> The problem comes when there are days where no data is logged. 

1. Calculate how many data-points each day represents
2. Sum the days you do have data for
3. Left-join #1 to #2 so you can calculate the average.

I'm guessing for #1 you'd rather not have 7 UNIONs, so you might find 
generate_series() a useful function. See Ch 9.18. Set Returning Functions.

HTH
--   Richard Huxton  Archonet Ltd


Re: Help on a complex query (avg data for day of the week)

От
Matthew Smith
Дата:
Richard (and list),

Thanks for the help! More below:

On Wed, 21 Dec 2005 09:04 pm, Richard Huxton wrote:
> Matthew Smith wrote:
> > I want to form a query that returns the average total usage for each day
> > of the week, eg:
>
> [snip]
>
> > To get this info, I am using the following query:
> >
> > select dow as day, sum(sum_data)/count(dow) as avg_usage from
> > (select extract('dow' from date_trunc('day', time)) as dow, sum(data) as
> > sum_data
> > from datalog where time >= '2005-09-11' and time <= '2005-09-25' group by
> > dow) as avg_data_per_day group by day;
> >
> > This works well, assuming that there is at least one entry in the table
> > for each day in the time period.
> >
> > The problem comes when there are days where no data is logged.
>
> 1. Calculate how many data-points each day represents
> 2. Sum the days you do have data for
> 3. Left-join #1 to #2 so you can calculate the average.
>
> I'm guessing for #1 you'd rather not have 7 UNIONs, so you might find
> generate_series() a useful function. See Ch 9.18. Set Returning Functions.

Sadly I am using 7.3, it seems that generate_series() is 8.0 and later. I am 
not sure what you meant by "Calculate how many data-points each day 
represents", but your mention of unions gave me an idea. It's nasty, but it 
will work for what I am trying to do (sorry if it is what you meant):

I can use unions to make a list of dates and left join them into my query to 
get the missing days in. This would be the same as left joining in the result 
from generate_series(), but it is not as tidy.

I will make the list using something like this:

select '2005-09-11'::timestamp as date UNION select '2005-09-12'::timestamp as 
date UNION select '2005-09-13'::timestamp as date UNION etc.

It's nasty, but it will hold until I can migrate to 8.1. Thanks again for the 
help!

Cheers,

Matthew Smith



Re: Help on a complex query (avg data for day of the week)

От
Michael Fuhr
Дата:
On Thu, Dec 22, 2005 at 11:59:24AM +1100, Matthew Smith wrote:
> Sadly I am using 7.3, it seems that generate_series() is 8.0 and later.

Yes, but it's easily written in PL/pgSQL for earlier versions.
Example:

CREATE FUNCTION generate_series(integer, integer) 
RETURNS SETOF integer AS '
DECLARE   i  integer;
BEGIN   FOR i IN $1 .. $2 LOOP       RETURN NEXT i;   END LOOP;   RETURN;
END;
' LANGUAGE plpgsql VOLATILE STRICT;

-- 
Michael Fuhr


Re: Help on a complex query (avg data for day of the week)

От
Matthew Smith
Дата:
Michael,

Thanks heaps for that!

Cheers,

Matt Smith

On Thu, 22 Dec 2005 01:37 pm, Michael Fuhr wrote:
> On Thu, Dec 22, 2005 at 11:59:24AM +1100, Matthew Smith wrote:
> > Sadly I am using 7.3, it seems that generate_series() is 8.0 and later.
>
> Yes, but it's easily written in PL/pgSQL for earlier versions.
> Example:
>
> CREATE FUNCTION generate_series(integer, integer)
> RETURNS SETOF integer AS '
> DECLARE
>     i  integer;
> BEGIN
>     FOR i IN $1 .. $2 LOOP
>         RETURN NEXT i;
>     END LOOP;
>     RETURN;
> END;
> ' LANGUAGE plpgsql VOLATILE STRICT;