Re: Multiple word synonyms (maybe?)
От | rob stone |
---|---|
Тема | Re: Multiple word synonyms (maybe?) |
Дата | |
Msg-id | 1445338679.1853.30.camel@gmail.com обсуждение исходный текст |
Ответ на | Multiple word synonyms (maybe?) (Tim van der Linden <tim@shisaa.jp>) |
Ответы |
Re: Multiple word synonyms (maybe?)
|
Список | pgsql-general |
On Tue, 2015-10-20 at 19:35 +0900, Tim van der Linden wrote: > Hi All > > I have a question regarding PostgreSQL's full text capabilities and > (presumably) the synonym dictionary. > > I'm currently implementing FTS on a medical themed setup which uses > domain specific jargon to denote a bunch of stuff. A specific request > I wish to implement here are the jargon synonyms that are heavily > used. > > Of course, I can simply go ahead and create my own synonym dictionary > with a jargon specific synonym file to feed it. However, most of the > synonyms are comprised out of more then a single word. > > The term "heart attack" for example has the following "synonyms": > > - Acute MI > - MI > - Myocardial infarction > > As far as I understand it, the tokenizer within PostgreSQL FTS engine > splits words on spaces to generate tokens which are then proposed to > each dictionary. I think it is therefor impossible to have "multi- > word synonyms" in this sense as multiple words cannot reach the > dictionary. The term "heart attack" would be presented as the tokens > "heart" and "attack". > > From a technical standpoint I understand FTS is about looking at > individual words and lexemizing them ... yet from a natural language > lookup perspective you still wish to tie "Heart attack" to "Acute MI" > so when a client search on one, the other will turn up as well. > > Should I write my own tokenizer to catch all these words and present > them as a single token? Or is this completely outside the realm of > FTS (or FTS within Postgresql)? > > Cheers, > Tim > > Looking at this from an entirely different perspective, why are you not using ICD codes to identify patient events? It is a one to many relationship between patient and their events identified by the relevant ICD code and date. Given that MI has several applicable ICD codes you can use a select along the lines of:- WHERE icd_code IN ( . . . ) I know it doesn't answer your question! Cheers, Rob
В списке pgsql-general по дате отправления: