Re: Faster distinct query?

Поиск

Список

Период

Сортировка

От	Israel Brewster
Тема	Re: Faster distinct query?
Дата	23 сентября 2021 г. 23:17:31
Msg-id	3272D898-610C-45F1-B227-EB47510E4B0B@alaska.edu обсуждение исходный текст
Ответ на	Re: Faster distinct query? (Geoff Winkless <pgsqladmin@geoff.dj>)
Ответы	Re: Faster distinct query? (Michael Lewis <mlewis@entrata.com>)
Список	pgsql-general

Дерево обсуждения

On Sep 23, 2021, at 10:36 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On Wed, 22 Sept 2021 at 21:05, Israel Brewster <ijbrewster@alaska.edu> wrote:
I was wondering if there was any way to improve the performance of this query:

SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;

If you have tables of possible stations and channels (and if not, why not?), then an EXISTS query, something like

SELECT stations.name, ARRAY_AGG(channels.name)
FROM stations, channels
WHERE EXISTS
(SELECT FROM data WHERE data.channels=channels.name AND data.station=stations.name)
GROUP BY stations.name

will usually be much faster, because it can stop scanning after the first match in the index.

So that one ran in about 5 minutes as well - apparently the time it takes to scan the index, given the similarity of run times for each of the different queries: https://explain.depesz.com/s/w46h

It’s making me think though, because this similar (though incomplete, and therefore incorrect result) query runs in only 19ms (https://explain.depesz.com/s/iZnN):

SELECT

stations.name,

array_agg(channels.channel)

FROM stations,channels

WHERE EXISTS (SELECT

FROM data

WHERE data.station=stations.id)

GROUP BY stations.name

It’s only when I add in the AND data.channels=channels.channel that the query time blows up to 5+ minutes. I personally don’t understand why there would be such a large difference between the two queries - something wrong with my indexes?

---

Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320

Work: 907-474-5172
cell: 907-328-9145

Geoff

В списке pgsql-general по дате отправления:

Предыдущее

От: Jaime Solorzano
Дата: 23 сентября 2021 г., 22:57:27
Сообщение: Postgres incremental backups per db (not per cluster)

Следующее

От: Ron
Дата: 23 сентября 2021 г., 23:37:51
Сообщение: Re: Postgres incremental backups per db (not per cluster)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Faster distinct query?

Предыдущее

Следующее