On Thu, Feb 24, 2011 at 1:55 PM, Dave Crooke <dcrooke@gmail.com> wrote:
> Hi foks
>
> This is an old chestnut which I've found a number of online threads for, and
> never seen a clever answer to. It seems a common enough idiom that there
> might be some slicker way to do it, so I thought I might inquire with this
> august group if such a clever answer exists ....
>
> Consider the following table
>
> create table data
> (id_key int,
> time_stamp timestamp without time zone,
> value double precision);
>
> create unique index data_idx on data (id_key, time_stamp);
>
> with around 1m rows, with 3500 or so distinct values of id_key.
>
> I need to find the most recent value for each distinct value of id_key.
> There is no elegant (that I know of) syntax for this, and there are two ways
> I've typically seen it done:
>
> 1. Use a dependent subquery to find the most recent time stamp, i.e.
>
> select
> a.id_key, a.time_stamp, a.value
> from
> data a
> where
> a.time_stamp=
> (select max(time_stamp)
> from data b
> where a.id_key=b.id_key)
>
> 2. Define a temporary table / view with the most recent time stamp for each
> key, and join against it:
>
> select
> a.id_key, a.time_stamp, a.value
> from
> data a,
> (select id_key, max(time_stamp) as mts
> from data group by id_key) b
> where
> a.id_key=b.id_key and a.time_stamp=b.mts
>
> I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as
> written" plan in each case, and that method 2. is much quicker (2.6 sec vs.
> 2 min on my laptop) ....
>
> Is there a more elegant way to write this, perhaps using PG-specific
> extensions?
one pg specific method that a lot of people overlook for this sort of
problem is custom aggregates.
create or replace function maxfoo(foo, foo) returns foo as
$$
select case when $1.t > $2.t then $1 else $2 end;
$$ language sql immutable;
create aggregate aggfoo(foo)
(
sfunc=maxfoo,
stype=foo
);
create table foo(id int, t timestamptz default now());
insert into foo values (1);
insert into foo values (1);
select (f).* from (select aggfoo(foo) as f from foo group by id) q;
postgres=# select (f).* from (select aggfoo(foo) as f from foo group by id) q;
id | t
----+----------------------------
1 | 2011-02-24 14:01:20.051-06
(1 row)
where this approach can be useful is when you have a very complicated
aggregation condition that can be awkward to express in a join.
merlin