Re: Mailing list search engine: surprising missing results?
От | Tom Lane |
---|---|
Тема | Re: Mailing list search engine: surprising missing results? |
Дата | |
Msg-id | 2274255.1643133268@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Mailing list search engine: surprising missing results? (Ivan Panchenko <i.panchenko@postgrespro.ru>) |
Ответы |
Re: Mailing list search engine: surprising missing results?
|
Список | pgsql-www |
Ivan Panchenko <i.panchenko@postgrespro.ru> writes: > The actual explanation can be seen from comparing a tsvector with a tsquery. > To avoid stemming effects, we use the simple configuration below. > # select plainto_tsquery('simple','boyers-moore'); > plainto_tsquery > ------------------------------------- > 'boyers-moore' & 'boyers' & 'moore' > # select to_tsvector('simple','boyers-moore-horspool'); > to_tsvector > ------------------------------------------------------------- > 'boyers':2 'boyers-moore-horspool':1 'horspool':4 'moore':3 > Obviously, such tsvector does not match the above tsquery. I think,a better tsquery for this query would be > 'boyers-moore' | ('boyers' & 'moore') > May be, it is worth changing to_tsquery() behavior for such cases. Changing the behavior of to_tsquery is certainly a lot less scary than changing to_tsvector --- it wouldn't call the validity of existing tsvector indexes into question. I see that to_tsquery is even sillier than plainto_tsquery: regression=# select to_tsquery('simple','boyers-moore'); to_tsquery ----------------------------------------- 'boyers-moore' <-> 'boyers' <-> 'moore' (1 row) which is absolutely not a sane translation. It seems to me that in both cases we'd be better off generating "'boyers' <-> 'moore'", without the compound token at all. Maybe there's a case for the weaker 'boyers' & 'moore' translation, but I think if people wanted that they'd just enter separate words. regards, tom lane
В списке pgsql-www по дате отправления: