Re: pg_trgm: unicode string not working
От | Florian Pflug |
---|---|
Тема | Re: pg_trgm: unicode string not working |
Дата | |
Msg-id | 0A0622C3-C778-4C6A-9345-0A6D81929BCF@phlo.org обсуждение исходный текст |
Ответ на | pg_trgm: unicode string not working (Sushant Sinha <sushant354@gmail.com>) |
Ответы |
Re: pg_trgm: unicode string not working
|
Список | pgsql-hackers |
Hi Next time, please post questions regarding the usage of postgres to the -general list, not to -hackers. The purpose of -hackers is to discuss the development of postgres proper, not the development of applications using postgres. On Jun12, 2011, at 13:33 , Sushant Sinha wrote: > I am using pg_trgm for spelling correction as prescribed in the > documentation. But I see that it does not work for unicode sring. The > database was initialized with utf8 encoding and the C locale. I think you need to use a locale (more precisely, a CTYPE) in which 'क', 'त', 'द' are considered to be alphanumeric. You can specify the CTYPE when creating the database with CREATE DATABASE ... LC_CTYPE = ... > Here is the table: > \d words > Table "public.words" > Column | Type | Modifiers > --------+---------+----------- > word | text | > ndoc | integer | > nentry | integer | > Indexes: > "words_idx" gin (word gin_trgm_ops) > > Query: select word from words where word % 'कतद'; > > I get an error: > > ERROR: GIN indexes do not support whole-index scans pg_trgm probably ignores non-alphanumeric characters during comparison, so you end up with an empty search string, which translates to a whole-index scan. Postgres up to 9.0 does not support such scans for GIN indices. Note that this restriction was removed in postgres 9.1 which is currently in beta. However, GIT indices must be re-created with REINDEX after upgrading from 9.0 to leverage that improvement. best regards. Florian Pflug
В списке pgsql-hackers по дате отправления: