Re: using Tsearch2 for chemical text
От | Naz Gassiep |
---|---|
Тема | Re: using Tsearch2 for chemical text |
Дата | |
Msg-id | 46A836C1.8080905@mira.net обсуждение исходный текст |
Ответ на | Re: using Tsearch2 for chemical text (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: using Tsearch2 for chemical text
|
Список | pgsql-general |
> I think you might need to write a custom lexer to divide the strings > into meaningful units. If there are subsections of these names that > make sense to search for, then tsearch2 can certainly handle the > mechanics of that, but I doubt that the standard rules will divide > these names into lexemes usefully. A custom lexer for tsearch2 that recognized chemistry related lexical components (di-, tetra-, acetyl-, ethan-, -oic, -ane, -ene etc) would increase *hugely* the out-of-the-box applicability of PostgreSQL to scientific applications. Perhaps such an effort could be co ordinated with a physics based lexer and biology related lexer, to perhaps provide a unified lexer that provided full scientific capabilities in the way that PostGIS provides unified geospatial capabilities. I don't know how best to bring such an effort about, but I do know that if such a thing were created it would be a boon for PostgreSQL, giving it a very significant leg up in terms of functionality, not to mention the great positive impact that the wide, free availability of such a tool would have on the scientific research community.
В списке pgsql-general по дате отправления: