Mailing list search engine: surprising missing results?
От | James Addison |
---|---|
Тема | Mailing list search engine: surprising missing results? |
Дата | |
Msg-id | CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Mailing list search engine: surprising missing results?
|
Список | pgsql-www |
Hello, I noticed that the mailing list search engine[1] seems to unexpectedly miss results for some queries. For example: A search for "boyer"[2] returns five results, including result snippets that contain the text "Boyer-More-Horspool" [sic] and "Boyer-Moore-Horspool". However, a more specific search for "boyer-moore"[3] does not return any results -- that seems surprising. Specializing the query further and searching for "boyer-moore-horspool"[4] *does* again return results -- two documents -- with the terms "boyer" and "horspool" highlighted. Although it's not a significant problem, I do have a theory that could explain the behaviour (offered in case it may save time on investigation): It seems possible that the term "more" -- and nearby misspellings, like "moore" -- may be filtered out as stopwords (meaning: they're not present in the search index), and that the search engine is configured to require a minimum percentage match rate for query terms. Under those conditions: searches for "boyer" would produce an 100% match rate, "boyer-moore" would produce 50% (since "moore" would not be found in the term index), and "boyer-moore-horspool" would match at 66-point-6-repeating percent. Given a required match rate of around two thirds, that could explain the behaviour (it might not be the true reason, but it seems like one possibility). Thanks, James [1] https://www.postgresql.org/search/ [2] https://www.postgresql.org/search/?m=1&q=boyer&l=1&d=365&s=r [3] https://www.postgresql.org/search/?m=1&q=boyer-moore&l=1&d=365&s=r [4] https://www.postgresql.org/search/?m=1&q=boyer-moore-horspool&l=1&d=365&s=r
В списке pgsql-www по дате отправления: