Re: Out of the box, full text search feature suggestion for postgresql

Поиск

Список

Период

Сортировка

От	Artur Zakirov
Тема	Re: Out of the box, full text search feature suggestion for postgresql
Дата	2 января 2024 г. 17:20:51
Msg-id	CAKNkYnzheAEsB9MM6b9jEBn+W7j1T5Qh6OyogH3f8ZX8M+9gkw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Out of the box, full text search feature suggestion for postgresql (Bruce Momjian <bruce@momjian.us>)
Ответы	Re: Out of the box, full text search feature suggestion for postgresql
Список	pgsql-bugs

Дерево обсуждения

On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:
> > Hello Postgres Team!
> >
> > First of all, a big THANK YOU for the great work you folks are doing!
> >
> > The reason I am writing to you is to suggest a feature in future Postgres
> > versions, a feature that is partially there but is not quite where it should be
> > in my opinion: the full text search functionality. This functionality in my
> > opinion, should be available out of the box, for any possible language
> > available, including east Asia character based languages. You would probably
> > say that this will require a huge amount of work, and I would say, a postgres
> > extension which does exactly this, already exists, and it is called : pgroonga
> > (https://pgroonga.github.io/)
>
> Please explain how this is different from what we already have:
>
>         https://www.postgresql.org/docs/current/textsearch.html

I'm not familiar with pgroonga, but the main issue with built-in text
search is that it cannot tokenize asian and many other languages
properly.

Here default parser cannot tokenize Japanese text:

=# select * from ts_parse('default', 'これはペンです');
 tokid |     token
-------+----------------
     2 | これはペンです

Unlike Latin:

=# select * from ts_parse('default', 'this is a pen');
 tokid | token
-------+-------
     1 | this
    12 |
     1 | is
    12 |
     1 | a
    12 |
     1 | pen

To add support for Japanese (and other languages) it is necessary to
write a new parser or fix the existing default parser.

On the other hand pgroonga's source code looks complex, and I doubt
that there are pgsql-hackers who know it and target languages well and
who will be able to port it to Postgres core.

--
Artur

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Out of the box, full text search feature suggestion for postgresql