Re: [GENERAL] Fragments in tsearch2 headline
От | Teodor Sigaev |
---|---|
Тема | Re: [GENERAL] Fragments in tsearch2 headline |
Дата | |
Msg-id | 48459312.9070505@sigaev.ru обсуждение исходный текст |
Ответ на | Re: [GENERAL] Fragments in tsearch2 headline (Sushant Sinha <sushant354@gmail.com>) |
Ответы |
Re: [GENERAL] Fragments in tsearch2 headline
|
Список | pgsql-hackers |
> Why we need norms? We don't need norms at all - all matched HeadlineWordEntry already marked by HeadlineWordEntry->item! If it equals to NULL then this word isn't contained in tsquery. > hlCover does the exact thing that Cover in tsrank does which is to find > the cover that contains the query. However hlcover has to go through > words that do not match the query. Cover on the other hand operates on > position indexes for just the query words and so it should be faster. Cover, by definition, is a minimal continuous text's piece matched by query. May be a several covers in text and hlCover will find all of them. Next, prsd_headline() (for now) tries to define the best one. "Best" means: cover contains a lot of words from query, not less that MinWords, not greater than MaxWords, hasn't words shorter that ShortWord on the begin and end of cover etc. > > The main reason why I would I like it to be fast is that I want to > generate all covers for a given query. Then choose covers with smallest hlCover generates all covers. > Let me know what you think on this patch and I will update the patch to > respect other options like MinWords and ShortWord. As I understand, you very wish to call Cover() function instead of hlCover() - by design, they should be identical, but accepts different document's representation. So, the best way is generalize them: develop a new one which can be called with some kind of callback or/and opaque structure to use it in both rank and headline. > > NumFragments < 2: > I wanted people to use the new headline marker if they specify > NumFragments >= 1. If they do not specify the NumFragments or put it to Ok, but if you unify cover generation and NumFragments == 1 then result for old and new algorithms should be the same... > On an another note I found that make_tsvector crashes if it receives a > ParsedText with curwords = 0. Specifically uniqueWORD returns curwords > as 1 even when it gets 0 words. I am not sure if this is the desired > behavior. In all places there is a check before call of make_tsvector. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: