Re: Mailing list search engine: surprising missing results?
От | Tom Lane |
---|---|
Тема | Re: Mailing list search engine: surprising missing results? |
Дата | |
Msg-id | 2257661.1643127753@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Mailing list search engine: surprising missing results? (Laurenz Albe <laurenz.albe@cybertec.at>) |
Ответы |
Re: Mailing list search engine: surprising missing results?
|
Список | pgsql-www |
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Tue, 2022-01-25 at 14:04 +0300, Oleg Bartunov wrote: >> On Mon, Jan 24, 2022 at 11:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Bruce Momjian <bruce@momjian.us> writes: >>>> On Mon, Jan 24, 2022 at 08:27:41AM +0100, Laurenz Albe wrote: >>>>> The reason is that the 'moore' in 'boyer-moore' is stemmed, since it >>>>> is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool' >>>>> isn't: > Not quite. The problem is question is the "'boyer-moore':1". > If that were "'boyer-moor':1" instead, the problem would disappear. Actually, when I try this here, it seems like the stemming *is* consistent: regression=# SELECT to_tsvector('english', 'Boyer-Moore-Horspool'); to_tsvector ---------------------------------------------------------- 'boyer':2 'boyer-moore-horspool':1 'horspool':4 'moor':3 (1 row) regression=# SELECT to_tsvector('english', 'Boyer-Moore'); to_tsvector ----------------------------------- 'boyer':2 'boyer-moor':1 'moor':3 (1 row) If you try variants of that where the first or third term is stemmable, say regression=# SELECT to_tsvector('english', 'Boyers-Moore-Horspool'); to_tsvector ----------------------------------------------------------- 'boyer':2 'boyers-moore-horspool':1 'horspool':4 'moor':3 (1 row) it sure appears that each component word is stemmed independently already. So I think the original explanation here is wrong and we need to probe more closely. regards, tom lane
В списке pgsql-www по дате отправления: