query optimization: aggregate and distinct

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	query optimization: aggregate and distinct
Дата	20 августа 2003 г. 20:35:42
Msg-id	200308201626.26310.jdavis-pgsql@empires.org обсуждение исходный текст
Ответы	Re: query optimization: aggregate and distinct Re: query optimization: aggregate and distinct
Список	pgsql-general

Дерево обсуждения

I have below a simplified version of what I'm trying to do. Basically, I am
trying to get both an aggregate (an average) and "most recent" value.

 g | v  |             ts
---+----+----------------------------
 1 | 10 | 2003-08-20 16:00:27.010769
 1 | 20 | 2003-08-20 16:00:30.380476
 2 | 40 | 2003-08-20 16:00:37.399717
 2 | 80 | 2003-08-20 16:00:40.265717

I would like, as output, something like this:

 g | v  |        avg         |             ts
---+----+--------------------+----------------------------
 1 | 20 | 15.000000000000000 | 2003-08-20 16:00:30.380476
 2 | 80 | 60.000000000000000 | 2003-08-20 16:00:40.265717

which I got by a query like:

SELECT
    t2.g,t2.v,t1.avg,t2.ts
FROM
    (SELECT
        g,avg(v)
        FROM t
        GROUP BY g
    ) t1,
    (SELECT
        DISTINCT ON (g)
        * FROM t
        ORDER BY g,ts DESC
    ) t2
WHERE t1.g = t2.g;

That produces the results that I need, but it seems inefficient to join a
table with itself like that. My real query (not this simplified example)
takes 5+ seconds and I suspect this join is why.

Is there a better way?

For my real query, it's using index scans where I'd expect, and I frequently
VACUUM ANALYZE the big table and I have all the stats turned on. Also, I have
more shared buffers than needed to put everything in RAM.

Right now I'm using 7.2.1. Any improvements in 7.3 or 7.4 that would help this
issue?

Regards,
    Jeff Davis

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

query optimization: aggregate and distinct