Обсуждение: Generating custom statistics rows puzzler...left join on parts of intervals?

Поиск
Список
Период
Сортировка

Generating custom statistics rows puzzler...left join on parts of intervals?

От
Ken Corey
Дата:
Hi All,

Got one of those tricky SQL questions that someone else might be able to
come up with a really slick answer.  I'll be as brief as I can be, but
it's a bit long.  The actual questions are at the bottom... RTFM would
be lovely if pointers are given.  I've searched the archives and docs
and come up blank...

My company provides games and interactive content to internet publishers
on an ASP basis.  This means we've got the raw data to produce
interesting usage statistics.

One of the things that would be quite interesting would be to take, on
an hourly basis, various of certain tables both total and hourly, and
then graph these. I've got an acceptable little java applet that graphcs
things in a quite easy way.  It's not terrifically powerful, but it's
not big, either.

A little background: when a customer puts our game on their site, it's
generally for a period of time, called a 'promotion'.  This promotion is
broken into 'rounds'.  Each round may have 1..n 'games' in it...and
these games can change between rounds.

So, to find out the total usage of all games for a given promotion,
it's:
  select
    promotion_id,count(promotion_id)
  from player_games
  group by promotion_id;

Okay so far. But perhaps we want to see the count of individual games:
  select
    promotion_id,game_id,count(game_id)
  from player_games
  group by promotion_id,game_id;

Again, good enough as it goes.  Now we get tricky...we want to know the
number of times each game is played on an hourly basis.
  select
    promotion_id, game_id, date_trunc('hour',when_inserted),
    count(date_trunc('hour',when_inserted))
  from player_games
  group by promotion_id,game_id,date_trunc('hour',when_inserted);

Now this will produce something like this:
   promotion_id | game_id |       date_trunc       | count
--------------+---------+------------------------+-------
           38 |       0 | 2001-11-28 09:00:00+00 |    84
           38 |       0 | 2001-11-28 12:00:00+00 |   372
           38 |       0 | 2001-11-28 13:00:00+00 |   320

Question 1)
-----------
Is the use of 'date_trunc' 3 times as in the above SQL not horribly
inefficient?  Is there a way to only have to do it once in the query?

Question 2)
-----------
The REAL problem is in the rows returned not having all the hours
represented.  Note that nobody apparently played game 0 between 10 and
11:59 that day.  when the simple java graph applet I mentioned comes
along, it's not smart enough to fill in the hours that aren't there.

So, we need to provide those blank hours somehow.  I can see several
ways, but was hoping there's a really slick SQL method of dealing with
this.

Method 0 (smarts in wrong place) Change the graphic applet so that it
understand date/time stamps, and fills in blanks somehow.  I don't like
this one, because the applet now has to know about the format of what
time looks like, instead of just copying the rows blindly to the page.

Method 1 (brute force) - The servlet that outputs the page gets the rows
that do exist, figures out which ones don't exist and inserts them in
the output stream with 0's and passes the lot to the applet. I don't
like this one, because the servlet now has to know about the format of
what time looks like, instead of just copying the rows blindly to the
page.

Method 2 (overkill) - Create a table called 'time' with a row for each
hour on the clock.  When creating the rows, do a left join on this table
so that hours that don't have any plays would still be represented. I'm
leery about doing left joins like this, as they are tricky, and can be
killers on the database.

Is there a slick use of 'between', 'interval' or some other SQL-ism (or
PostgreSQL-ism) that will give me in effect a left join on all the hours
from start::timestamp to end::timestamp?

--
Ken Corey  CTO  http://www.atomic-interactive.com  07720 440 731


Re: Generating custom statistics rows puzzler...left join

От
kenc
Дата:
> Method 2 (overkill) - Create a table called 'time' with a row for each
> hour on the clock.  When creating the rows, do a left join on this table
> so that hours that don't have any plays would still be represented. I'm
> leery about doing left joins like this, as they are tricky, and can be
> killers on the database.
>
> Is there a slick use of 'between', 'interval' or some other SQL-ism (or
> PostgreSQL-ism) that will give me in effect a left join on all the hours
> from start::timestamp to end::timestamp?

Hrm...okay, I have what looks like a potential solution using a table
containing every hourly timestamp from 2001 through now into 2003 left
joined to the initial hourly query above...but after this, I'm know I'm
gonna have nightmares tonight.

Can anyone make this more elegant?

(NOTE: the comments were entered by hand in this email...they might not
work in the database...)

select *
from
  (--
   -- Get all hours and the promotion id
   -- from the beginning of the promotion to now.
   select timeval, promotion_id as leftpromo
   from timetab, promotions
   where
     timetab.timeval >= promotions.startdate
     and timetab.timeval >= '2002-04-01'::timestamp
     and timetab.timeval <= promotions.enddate
     and timetab.timeval <= CURRENT_TIMESTAMP
     and promotions.startdate <= CURRENT_TIMESTAMP
     and promotions.enddate >= CURRENT_TIMESTAMP
  ) as times
left outer join
  (--
   -- Join those with the hourly counts
   select u.promotion_id,date_trunc('hour',u.wheninserted) as hour
,count(date_trunc('hour',u.wheninserted))
   from player u, promotions p
   where u.promotion_id=p.promotion_id
     and p.startdate <= CURRENT_TIMESTAMP
     and p.enddate >= CURRENT_TIMESTAMP
   group by u.promotion_id,date_trunc('hour',u.wheninserted)) as result
on (times.timeval = result.hour and times.leftpromo=result.promotion_id)
order by leftpromo, timeval;

incidentally, this returns ~40,000 rows...*way* too many to store for
IRO 15 promotions.

I guess I'm going to have to try either the smarts in wrong place or
brute force method to conserve on resource usage.

Ah well.

-Ken

--
Ken Corey  CTO  http://www.atomic-interactive.com  07720 440 731