Multiple word synonyms (maybe?)
От | Tim van der Linden |
---|---|
Тема | Multiple word synonyms (maybe?) |
Дата | |
Msg-id | 20151020193538.df8194ca307fb5f9cb0ab13d@shisaa.jp обсуждение исходный текст |
Ответы |
Re: Multiple word synonyms (maybe?)
Re: Multiple word synonyms (maybe?) |
Список | pgsql-general |
Hi All I have a question regarding PostgreSQL's full text capabilities and (presumably) the synonym dictionary. I'm currently implementing FTS on a medical themed setup which uses domain specific jargon to denote a bunch of stuff. Aspecific request I wish to implement here are the jargon synonyms that are heavily used. Of course, I can simply go ahead and create my own synonym dictionary with a jargon specific synonym file to feed it. However,most of the synonyms are comprised out of more then a single word. The term "heart attack" for example has the following "synonyms": - Acute MI - MI - Myocardial infarction As far as I understand it, the tokenizer within PostgreSQL FTS engine splits words on spaces to generate tokens which arethen proposed to each dictionary. I think it is therefor impossible to have "multi-word synonyms" in this sense as multiplewords cannot reach the dictionary. The term "heart attack" would be presented as the tokens "heart" and "attack". From a technical standpoint I understand FTS is about looking at individual words and lexemizing them ... yet from a naturallanguage lookup perspective you still wish to tie "Heart attack" to "Acute MI" so when a client search on one, theother will turn up as well. Should I write my own tokenizer to catch all these words and present them as a single token? Or is this completely outsidethe realm of FTS (or FTS within Postgresql)? Cheers, Tim
В списке pgsql-general по дате отправления: