Re: Out of the box, full text search feature suggestion for postgresql
От | Artur Zakirov |
---|---|
Тема | Re: Out of the box, full text search feature suggestion for postgresql |
Дата | |
Msg-id | CAKNkYnzheAEsB9MM6b9jEBn+W7j1T5Qh6OyogH3f8ZX8M+9gkw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Out of the box, full text search feature suggestion for postgresql (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Out of the box, full text search feature suggestion for postgresql
|
Список | pgsql-bugs |
On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote: > > Hello Postgres Team! > > > > First of all, a big THANK YOU for the great work you folks are doing! > > > > The reason I am writing to you is to suggest a feature in future Postgres > > versions, a feature that is partially there but is not quite where it should be > > in my opinion: the full text search functionality. This functionality in my > > opinion, should be available out of the box, for any possible language > > available, including east Asia character based languages. You would probably > > say that this will require a huge amount of work, and I would say, a postgres > > extension which does exactly this, already exists, and it is called : pgroonga > > (https://pgroonga.github.io/) > > Please explain how this is different from what we already have: > > https://www.postgresql.org/docs/current/textsearch.html I'm not familiar with pgroonga, but the main issue with built-in text search is that it cannot tokenize asian and many other languages properly. Here default parser cannot tokenize Japanese text: =# select * from ts_parse('default', 'これはペンです'); tokid | token -------+---------------- 2 | これはペンです Unlike Latin: =# select * from ts_parse('default', 'this is a pen'); tokid | token -------+------- 1 | this 12 | 1 | is 12 | 1 | a 12 | 1 | pen To add support for Japanese (and other languages) it is necessary to write a new parser or fix the existing default parser. On the other hand pgroonga's source code looks complex, and I doubt that there are pgsql-hackers who know it and target languages well and who will be able to port it to Postgres core. -- Artur
В списке pgsql-bugs по дате отправления: