Re: BUG #4562: ts_headline() adds space when parsing url
От | Tom Lane |
---|---|
Тема | Re: BUG #4562: ts_headline() adds space when parsing url |
Дата | |
Msg-id | 4357.1228785628@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #4562: ts_headline() adds space when parsing url ("Denis Monsieur" <dmonsieur@gmail.com>) |
Ответы |
Re: BUG #4562: ts_headline() adds space when parsing
url
Re: BUG #4562: ts_headline() adds space when parsing url |
Список | pgsql-bugs |
"Denis Monsieur" <dmonsieur@gmail.com> writes: > The problem is a space being added to text in the form of > http://some.url/path > Compare the output: > shs=# SELECT ts_headline('http://some.url', to_tsquery('sometext')); > ts_headline > ----------------- > http://some.url > (1 row) > shs=# SELECT ts_headline('http://some.url/path', to_tsquery('sometext')); > ts_headline > ----------------------- > http:// some.url/path > (1 row) I looked into this, and it seems that the problem is that generateHeadline() emits a space for any token marked as replace = 1. I think it probably shouldn't emit anything at all. AFAICS the cases where replace will get set are token types URL, TAG, NUMHWORD, ASCIIHWORD, HWORD. For URL and the HWORD variants the space is certainly undesirable, because these token types are just respecifying text that is also covered by their component tokens. The only case where you could make an argument that the space is useful is TAG, as in regression=# SELECT ts_headline('http<foo>blah', to_tsquery('sometext')); ts_headline ------------- http blah (1 row) But it seems to me to be at least as plausible that you should get nothing as that you should get a space for a removed tag. Comments? regards, tom lane
В списке pgsql-bugs по дате отправления: