Planner should use index on a LIKE 'foo%' query
От | Moritz Onken |
---|---|
Тема | Planner should use index on a LIKE 'foo%' query |
Дата | |
Msg-id | 287E4428-BECC-46D6-BC47-EA94EA0121F0@houseofdesign.de обсуждение исходный текст |
Ответы |
Re: Planner should use index on a LIKE 'foo%' query
Re: Planner should use index on a LIKE 'foo%' query |
Список | pgsql-performance |
Hi, I have a query select count(*) from result where exists (select * from item where item.url LIKE result.url || '%' limit 1); which basically returns the number of items which exist in table result and match a URL in table item by its prefix. I read all about idexes (http://www.postgresql.org/docs/8.3/static/indexes-types.html ) and especially this part: "The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo %' or col ~ '^foo', but not col LIKE '%bar'." Since my server does not use the C locale I created the index with CREATE INDEX test_index ON item USING btree (url varchar_pattern_ops); which works fine for queries like SELECT distinct url from item where url like 'http://www.micro%' limit 10; explain analyze shows: "Limit (cost=9.53..9.54 rows=1 width=34) (actual time=80.809..80.856 rows=10 loops=1)" " -> Unique (cost=9.53..9.54 rows=1 width=34) (actual time=80.806..80.835 rows=10 loops=1)" " -> Sort (cost=9.53..9.53 rows=1 width=34) (actual time=80.802..80.812 rows=11 loops=1)" " Sort Key: url" " Sort Method: quicksort Memory: 306kB" " -> Index Scan using test_index on item (cost=0.00..9.52 rows=1 width=34) (actual time=0.030..6.165 rows=2254 loops=1)" " Index Cond: (((url)::text ~>=~ 'http:// www.micro'::text) AND ((url)::text ~<~ 'http://www.micrp'::text))" " Filter: ((url)::text ~~ 'http://www.micro%'::text)" "Total runtime: 80.908 ms" which is great but if I run the query with the subselect it uses a sequence scan: select * from result where exists (select * from item where item.url LIKE result.url || '%' limit 1) limit 10; "Limit (cost=0.00..96.58 rows=10 width=36) (actual time=12.660..35295.928 rows=10 loops=1)" " -> Seq Scan on result (cost=0.00..93886121.77 rows=9721314 width=36) (actual time=12.657..35295.906 rows=10 loops=1)" " Filter: (subplan)" " SubPlan" " -> Limit (cost=0.00..4.81 rows=1 width=42) (actual time=2715.061..2715.061 rows=1 loops=13)" " -> Seq Scan on item (cost=0.00..109589.49 rows=22781 width=42) (actual time=2715.055..2715.055 rows=1 loops=13)" " Filter: ((url)::text ~~ (($0)::text || '%'::text))" "Total runtime: 35295.994 ms" The only explaination is that I don't use a constant when comparing the values. But actually it is a constant... any help? using postgres 8.3.3 on ubuntu. Cheers, moritz
В списке pgsql-performance по дате отправления: