Re: Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?
От | Denis Papathanasiou |
---|---|
Тема | Re: Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index? |
Дата | |
Msg-id | 4C75697E.2030005@gmail.com обсуждение исходный текст |
Ответ на | Re: Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
> Not per se, but maybe looking at the actual outputs of the two function > calls would be enlightening. I suspect you'll find that the search > conditions you are getting are not equivalent. Strictly speaking they're not, b/c the plainto_tsquery() is chaining several tokens together. However, at the heart of the question is this: if I define the index on that column like this: CREATE INDEX item_eng_searchable_text_idx ON item USING gin(to_tsvector('english', searchable_text)); since, unlike the example in the http://www.postgresql.org/docs/8.4/interactive/textsearch-tables.html doc, I cannot define the searchable text column in the table as being being of type ts_vector(), because the text is not exclusively in English; then, will the index be effective? Based on some empirical experiments, it seems not. If that's indeed that case, then how should I create the index? > In that case you haven't understood how text search works at all. > It simply doesn't do that. You could possibly follow up a text search > for component words with a LIKE or similar comparison to verify that > the matches actually contain the desired string. I cannot use LIKE on each token of the phrase because the text in the column is unordered, and I would have to do an ILIKE '%'+token+'%' on each to be logically correct. IIRC, wildcards of the form token+'%' can use an index efficiently, but wildcards on both ends such as '%'+token+'%' do not. I did think about splitting the phrase tokens and doing a tsquery() join on each token, but it seemed that's why plainto_tsquery() was created, unless I misunderstood the docs on that specific point.
В списке pgsql-general по дате отправления: