Re: [GENERAL] Fragments in tsearch2 headline
От | Teodor Sigaev |
---|---|
Тема | Re: [GENERAL] Fragments in tsearch2 headline |
Дата | |
Msg-id | 483BD4CB.3030006@sigaev.ru обсуждение исходный текст |
Ответ на | Re: [GENERAL] Fragments in tsearch2 headline (Sushant Sinha <sushant354@gmail.com>) |
Ответы |
Re: [GENERAL] Fragments in tsearch2 headline
|
Список | pgsql-hackers |
Hi! > 1. Why is hlparsetext used to parse the document rather than the > parsetext function? Since words to be included in the headline will be > marked afterwords, it seems more reasonable to just use the parsetext > function. > The main difference I see is the use of hlfinditem and marking whether > some word is repeated. hlparsetext preserves any kind of lexeme - not indexed, spaces etc. parsetext doesn't. hlparsetext preserves original form of lexemes. parsetext doesn't. > > The reason this is important is that hlparsetext does not seem to be > storing word positions which parsetext does. The word positions are > important for generating headline with fragments. Doesn't needed - hlparsetext preserves the whole text, so, position is a number of array. > > 2. >> I would prefer the signature ts_headline( [regconfig,] text, tsquery >> [,text] )and function should accept 'NumFragments=>N' for default >> parser. Another parsers may use another options. > > Does this mean we want a unified function ts_headline and we trigger the > fragments if NumFragments is specified? Trigger should be inside parser-specific function (pg_ts_parser.prsheadline). Another parsers might not recognize that option. > It seems that introducing a new > function which can take configuration OID, or name is complex as there > are so many functions handling these issues in wparser.c. No, of course - ts_headline takes care about finding configuration and calling correct parser. > > If this is true then we need to just add marking of headline words in > prsd_headline. Otherwise we will need another prsd_headline_with_covers > function. Yeah, pg_ts_parser.prsheadline should mark the lexemes to. It even can change an array of HeadlineParsedText. > > 3. In many cases people may already have TSVector for a given document > (for search operation). Would it be faster to pass TSVector to headline > function when compared to computing TSVector each time? If that is the > case then should we have an option to pass TSVector to headline > function? As I mentioned above, tsvector doesn;t contain whole information about text. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: