Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
От | Bayer, Samuel |
---|---|
Тема | Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres |
Дата | |
Msg-id | 7ee2afc2-dcf7-2bc9-3092-8ca58ed2b880@mitre.org обсуждение исходный текст |
Ответ на | Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres |
Список | pgsql-general |
I've tried both ranking functions. I've tried a variety of the normalization settings. I'm using the standard English languageconfiguration. Postgres 13. I do understand your FTS philosophy - I suppose I'm looking for guidance about how best to approximate the search capabilityin Solr using the FTS pieces you have. One concrete question, I suppose, is: the classic TF/IDF search strategyrelies on inverse document frequency, which looks across the corpus. I can't tell whether that corpus-wide frequencyinformation is taken into account in either ranking function. I don't know if Solr weights earlier tokens more heavily, but I wouldn't be surprised if it does. On 3/4/22 11:09 AM, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> On Fri, Mar 4, 2022 at 10:41:16AM -0500, Bayer, Samuel wrote: >>> I apologize for not being able to be more specific. > >> I know it is hard to quantify. Is it possible that Postgres is treating >> all the terms equally, while Solr is prioritizing terms that are earlier >> in the document? > > A few basic questions: > > * which ranking function are you using? > > * with what options? > > * which PG version exactly? > > As far as I can see from a quick look at the docs, neither > ts_rank() nor ts_rank_cd() consider "earlier in the document" > to be an interesting consideration. They do have the ability > to prefer terms that have been marked as having a higher weight, > but you'd need to do some setup work to make that useful --- > basically, you'd have to separate out the title or other metadata > and apply setweight() to it while building the tsvectors. > > I wouldn't be surprised if Solr has some well-tuned default > heuristics that mean that you don't have to work hard to get > good results from it. The current state of our FTS features > is more like "here's all the parts, but you have to build the > behavior you want". > > ISTM that our FTS features have basically been on autopilot > since they went in. I'd sort of hoped that we'd see more > parsers, more ranking functions, etc, over time ... but nothing > like that has happened. I'm not sure if that's just lack of > interest, or if people find the code too difficult to work with. > > regards, tom lane
В списке pgsql-general по дате отправления: