more] indexed regex select optimisations?
От | Stuart Woolford |
---|---|
Тема | more] indexed regex select optimisations? |
Дата | |
Msg-id | 99110813015600.00960@test.macmillan.co.nz обсуждение исходный текст |
Ответ на | Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing? (Charles Tassell <ctassell@isn.net>) |
Список | pgsql-general |
Well, I've improved my regex text searches to actually use the indexes properly now for the basic case, but I have found another 'problem' (or feature, call it what you will ;) - to demonstrate: with locale turned on (the default RPMS are like this): the following takes a LONG time to run on 1.6 million records: ------------------------------------- explain select isbn, count from inv_word_i where word~'^foo' order by count Sort (cost=35148.70 rows=353 width=16) -> Index Scan using i3 on inv_word_i (cost=35148.70 rows=353 width=16) ------------------------------------- the following runs instantly, and does (nearly) the same thing: ------------------------------------- explain select isbn, count from inv_word_i where word>='foo' and word<'fop' order by count Sort (cost=11716.57 rows=183852 width=16) -> Index Scan using i3 on inv_word_i (cost=11716.57 rows=183852 width=16) ------------------------------------- but what about the following? : ------------------------------------- explain select isbn , sum(count) from inv_word_i where (word>='window' and word<'windox') or (word>='idiot' and word<'idiou') group by isbn order by sum(count) desc Sort (cost=70068.84 rows=605525 width=16) -> Aggregate (cost=70068.84 rows=605525 width=16) -> Group (cost=70068.84 rows=605525 width=16) -> Sort (cost=70068.84 rows=605525 width=16) -> Seq Scan on inv_word_i (cost=70068.84 rows=605525 width=16) ------------------------------------- this is the fastest way I've found so far to do a multi-word search (window and idiot as the root words in this case), you note it does NOT use the indexes, but falls back to a linear scan?!? it takes well over 30 seconds (much much too long) I've tried a LOT of different combinations, and have yet to find a way of getting the system to use the indexes correctly to do what I want, the closest I've ffound is using a select intersect select method to find all docs containing both word (what I really want, although the query above is a ranked or query), but it gets slow as soon as I select more than one field for the results (I need to line isbn in this case to another database in the final application) I assume there is some reason the system falls back to a linear scan in this case? it seems two index lookups would be much much more efficient.. am I missing something again? -- ------------------------------------------------------------ Stuart Woolford, stuartw@newmail.net Unix Consultant. Software Developer. Supra Club of New Zealand. ------------------------------------------------------------
В списке pgsql-general по дате отправления: