Re: [GENERAL] Fragments in tsearch2 headline
От | Teodor Sigaev |
---|---|
Тема | Re: [GENERAL] Fragments in tsearch2 headline |
Дата | |
Msg-id | 4843FF4D.8030109@sigaev.ru обсуждение исходный текст |
Ответ на | Re: [GENERAL] Fragments in tsearch2 headline (Sushant Sinha <sushant354@gmail.com>) |
Ответы |
Re: [GENERAL] Fragments in tsearch2 headline
|
Список | pgsql-hackers |
> I have attached a new patch with respect to the current cvs head. This > produces headline in a document for a given query. Basically it > identifies fragments of text that contain the query and displays them. New variant is much better, but... > HeadlineParsedText contains an array of actual words but not > information about the norms. We need an indexed position vector for each > norm so that we can quickly evaluate a number of possible fragments. > Something that tsvector provides. Why do you need to store norms? The single purpose of norms is identifying words from query - but it's already done by hlfinditem. It sets HeadlineWordEntry->item to corresponding QueryOperand in tsquery. Look, headline function is rather expensive and your patch adds a lot of extra work - at least in memory usage. And if user calls with NumFragments=0 the that work is unneeded. > This approach does not change any other interface and fits nicely with > the overall framework. Yeah, it's a really big step forward. Thank you. You are very close to committing except: Did you find a hlCover() function which produce a cover from original HeadlineParsedText representation? Is any reason to do not use it? > > The norms are converted into tsvector and a number of covers are > generated. The best covers are then chosen to be in the headline. The > covers are separated using a hardcoded coversep. Let me know if you want > to expose this as an option. > > Covers that overlap with already chosen covers are excluded. > > Some options like ShortWord and MinWords are not taken care of right > now. MaxWords are used as maxcoversize. Let me know if you would like to > see other options for fragment generation as well. ShortWord, MinWords and MaxWords should store their meaning, but for each fragment, not for the whole headline. > > Let me know any more changes you would like to see. if (num_fragments == 0) /* call the default headline generator */ mark_hl_words(prs, query,highlight, shortword, min_words, max_words); else mark_hl_fragments(prs, query, highlight, num_fragments,max_words); Suppose, num_fragments < 2? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: