HTML tags and tsearch2
От | Joanna Sharman |
---|---|
Тема | HTML tags and tsearch2 |
Дата | |
Msg-id | 20080626121158.lb0dui10gg44ck40@www.staffmail.ed.ac.uk обсуждение исходный текст |
Ответы |
Re: HTML tags and tsearch2
|
Список | pgsql-general |
Hi, I have recently started experimenting with tsearch2 and it seems that the default behaviour is to ignore HTML tags and treat them as word-separators. What I would like it to do is to ignore HTML tags within words, but instead of creating separate words, combine the characters separated by the tag into one word. For example: in the database I have words like 'K<sub>ir</sub>' that need to be searched using the term without HTML tags, i.e. 'Kir'. Currently, the HTML tags are ignored and two words are stored in the vector, 'k' and 'ir'. I would like only one word, 'kir', to be stored in the vector, so that searches using the word 'kir' will match the row. A second, related question is whether it is possible to cause tsearch2 to split up words when it encounters digits, e.g. 'TM8' into 'TM' and '8'. I am not sure if this functionality is possible to implement using tsearch2 or if there might be a better way, so I would be grateful for any advice or pointers to further reading on how I might do this. (I am using PostgreSQL version 8.1.10) Many thanks in advance, Joanna -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
В списке pgsql-general по дате отправления: