Re: Performance tips

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Performance tips
Дата	10 января 2002 г. 13:56:34
Msg-id	21703.1010688265@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Performance tips (Andrew Perrin <andrew_perrin@unc.edu>)
Ответы	Re: Performance tips
Список	pgsql-general

Дерево обсуждения

Andrew Perrin <andrew_perrin@unc.edu> writes:
> auth=# EXPLAIN select count(patternid) from patterns where patternid in
> (select
> auth(# o_patternid from letters, pattern_occurrences where
> letters.letterid =
> auth(# pattern_occurrences.o_letterid and letters.datecat in (1,2));
> NOTICE:  QUERY PLAN:

> Aggregate  (cost=10770432787318.88..10770432787318.88 rows=1 width=4)
>   ->  Seq Scan on patterns  (cost=0.00..10770432756138.14 rows=12472297
> width=4)
>         SubPlan
>           ->  Materialize  (cost=863548.43..863548.43 rows=5749731
> width=12)
>                 ->  Hash Join  (cost=1741.00..863548.43 rows=5749731
> width=12)
>                       ->  Seq Scan on pattern_occurrences
> (cost=0.00..250248.56 rows=15287556 width=8)
>                       ->  Hash  (cost=1729.67..1729.67 rows=4530 width=4)
>                             ->  Seq Scan on letters  (cost=0.00..1729.67
> rows=4530 width=4)

Well, it's materializing the subquery result, which is good, but are
there really going to be 5.7M rows in the subquery result?  If so,
no wonder you're hurting: the IN is going to be scanning through that
result for each row from the outer query, until it either finds a match
or reaches the end.  Can you reduce the size of the subquery result at
all?  (If the subquery as written produces a lot of duplicate
o_patternids, then making it be a SELECT DISTINCT might help.)

The long-term answer is probably that you need to convert the IN to some
smarter form of join.  One idea that comes to mind is

select count(patternid)
from patterns,
    (select distinct o_patternid from letters, pattern_occurrences where
     letters.letterid = pattern_occurrences.o_letterid
     and letters.datecat in (1,2)) AS ss
where patternid = ss.o_patternid;

Given the "select distinct" to ensure there are no duplicates in the
subselect output, this should produce the same output as the original,
I think, and it would give the planner a shot at using a merge or hash
join to match up the pattern id values.

Oh, BTW: you might also try kicking up sort_mem if you didn't already.

            regards, tom lane

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance tips