Re: Can pg_trgm handle non-alphanumeric characters?
От | Fujii Masao |
---|---|
Тема | Re: Can pg_trgm handle non-alphanumeric characters? |
Дата | |
Msg-id | CAHGQGwHMru9oYhcPSHr39tU_cnggw7+kX8BJjh6yT4o4_DB2GQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Can pg_trgm handle non-alphanumeric characters? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Fri, May 11, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> On Fri, May 11, 2012 at 12:07 AM, MauMau <maumau307@gmail.com> wrote: >>> Thanks for your explanation. Although I haven't understood it well yet, I'll >>> consider what you taught. And I'll consider if the tentative measure of >>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm >>> against Japanese text. > >> In Japanese, it's common to do a text search with two characters keyword. >> But since pg_trgm is 3-gram, you basically would not be able to use index >> for such text search. So you might need something like pg_bigm or pg_unigm >> for Japanese text search. Even if an index can be used for two characters text search, bitmap index scan picks up all rows, so it's too slow. > I believe the trigrams are three *bytes* not three characters. So a > couple of kanji should work just fine for this. Really? As far as I read the code of pg_trgm, the trigram is three characters and its CRC32 is used as an index key if its size is more than three bytes. Regards, -- Fujii Masao
В списке pgsql-hackers по дате отправления: