Обсуждение: [Fwd: query efficiency - Can I speed it up?]
--
Ann
"In a world without walls and fences - who needs Windows and Gates ?"
(unknown)
Dear All
Can anyone tell me if/how I can speed up this query? You could just
point me at a good 'guide to building efficient queries' if I'm missing
something obvious. I've tried a few online sources and most of my
queries are much faster now but this one ???
Thanks
nemdb3=# explain select pept_id,description,confidence from gotcha inner
join go on (gotcha.go_term=go.go_term) where go.description~'oxygen' and
( spid='ALP' or spid='ASP' or spid='DIP' or spid='GPP' ) order by
gotcha.confidence;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=187523.03..187523.54 rows=203 width=54)
Sort Key: gotcha.confidence
-> Nested Loop (cost=0.00..187515.25 rows=203 width=54)
-> Seq Scan on go (cost=0.00..266.50 rows=4 width=50)
Filter: (description ~ 'oxygen'::text)
-> Index Scan using gotcha_go_term_sp_id on gotcha
(cost=0.00..46809.29 rows=232 width=32)
Index Cond: ((gotcha.go_term)::text =
("outer".go_term)::text)
Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text =
'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text =
'GPP'::text))
(8 rows)
nemdb3=# \d go
Table "public.go"
Column | Type | Modifiers
-----------------+-----------------------+-----------
go_term | character varying(14) |
go_asp | character varying(1) |
description | text |
hierarchy_level | integer |
Indexes:
"go_go_term" btree (go_term)
nemdb3=# \d gotcha
Table "public.gotcha"
Column | Type | Modifiers
------------+-----------------------+-----------
run_no | character varying(20) |
pept_id | character varying(15) |
contig | integer |
confidence | integer |
go_term | character varying(14) |
hits | integer |
spid | character varying(5) |
Indexes:
"gotcha_go_term_sp_id" btree (go_term, spid)
"gotcha_pept_id" btree (pept_id)
"gotcha_spid" btree (spid)
nemdb3=# select relname, relkind, reltuples, relpages from pg_class
where relname like 'go%';
relname | relkind | reltuples | relpages
------------------------+---------+-------------+----------
gotcha | r | 2.70698e+07 | 695188
go | r | 10600 | 134
gotcha_pept_id | i | 2.70698e+07 | 104266
go_go_term | i | 10600 | 43
gotcha_spid | i | 2.70698e+07 | 74225
gotcha_go_term_sp_id | i | 2.70698e+07 | 134201
(11 rows)
nemdb3=#
--
Ann
"In a world without walls and fences - who needs Windows and Gates ?"
(unknown)
ann hedley <ann.hedley@ed.ac.uk> writes:
> Can anyone tell me if/how I can speed up this query?
Try a newer version of Postgres --- I think 8.1 is the first one that
can make decent use of that two-column index in this type of query.
Notice that in the plan, only the "go_term" column is being checked
in the index condition:
> -> Index Scan using gotcha_go_term_sp_id on gotcha
> (cost=0.00..46809.29 rows=232 width=32)
> Index Cond: ((gotcha.go_term)::text =
> ("outer".go_term)::text)
> Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text =
> 'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text =
> 'GPP'::text))
I take it from the enormous cost that there are going to be lots of rows
with the same go_term, and it's the spid filter that is cutting it down
to a reasonable number of rows ... but this plan is going to visit the
heap for every row matching go_term, because the planner isn't smart
enough to fold the OR'd restriction clause together with the join
clause to make an index condition. It applies it as a "filter" instead
which is way way slower in this situation. I can't tell exactly which
PG release you're using, but it's definitely older than 8.1.
regards, tom lane
psql (PostgreSQL) 8.0.3
Thanks, I'll try an upgrade.
Tom Lane wrote:
> ann hedley <ann.hedley@ed.ac.uk> writes:
>
>> Can anyone tell me if/how I can speed up this query?
>>
>
> Try a newer version of Postgres --- I think 8.1 is the first one that
> can make decent use of that two-column index in this type of query.
> Notice that in the plan, only the "go_term" column is being checked
> in the index condition:
>
>
>> -> Index Scan using gotcha_go_term_sp_id on gotcha
>> (cost=0.00..46809.29 rows=232 width=32)
>> Index Cond: ((gotcha.go_term)::text =
>> ("outer".go_term)::text)
>> Filter: (((spid)::text = 'ALP'::text) OR ((spid)::text =
>> 'ASP'::text) OR ((spid)::text = 'DIP'::text) OR ((spid)::text =
>> 'GPP'::text))
>>
>
> I take it from the enormous cost that there are going to be lots of rows
> with the same go_term, and it's the spid filter that is cutting it down
> to a reasonable number of rows ... but this plan is going to visit the
> heap for every row matching go_term, because the planner isn't smart
> enough to fold the OR'd restriction clause together with the join
> clause to make an index condition. It applies it as a "filter" instead
> which is way way slower in this situation. I can't tell exactly which
> PG release you're using, but it's definitely older than 8.1.
>
> regards, tom lane
>
>
--
Ann
"In a world without walls and fences - who needs Windows and Gates ?"
(unknown)