Re: Faster distinct query?
От | Israel Brewster |
---|---|
Тема | Re: Faster distinct query? |
Дата | |
Msg-id | 3272D898-610C-45F1-B227-EB47510E4B0B@alaska.edu обсуждение исходный текст |
Ответ на | Re: Faster distinct query? (Geoff Winkless <pgsqladmin@geoff.dj>) |
Ответы |
Re: Faster distinct query?
|
Список | pgsql-general |
On Sep 23, 2021, at 10:36 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:On Wed, 22 Sept 2021 at 21:05, Israel Brewster <ijbrewster@alaska.edu> wrote:I was wondering if there was any way to improve the performance of this query:SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;If you have tables of possible stations and channels (and if not, why not?), then an EXISTS query, something likeSELECT stations.name, ARRAY_AGG(channels.name)
FROM stations, channels
WHERE EXISTS
(SELECT FROM data WHERE data.channels=channels.name AND data.station=stations.name)
GROUP BY stations.name
will usually be much faster, because it can stop scanning after the first match in the index.
It’s making me think though, because this similar (though incomplete, and therefore incorrect result) query runs in only 19ms (https://explain.depesz.com/s/iZnN):
SELECT
stations.name,
array_agg(channels.channel)
FROM stations,channels
WHERE EXISTS (SELECT
FROM data
WHERE data.station=stations.id)
GROUP BY stations.name
It’s only when I add in the AND data.channels=channels.channel that the query time blows up to 5+ minutes. I personally don’t understand why there would be such a large difference between the two queries - something wrong with my indexes?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
cell: 907-328-9145
Geoff
В списке pgsql-general по дате отправления: