Re: Extending range of to_tsvector et al
| От | Dan Scott |
|---|---|
| Тема | Re: Extending range of to_tsvector et al |
| Дата | |
| Msg-id | CAAY5AM3d4SYKYVOO82b8urtGKkGOnRjpUbmyPfU37gC_baSY8w@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Extending range of to_tsvector et al (john knightley <john.knightley@gmail.com>) |
| Ответы |
Re: Extending range of to_tsvector et al
|
| Список | pgsql-hackers |
Hi John: On Sun, Sep 30, 2012 at 11:45 PM, john knightley <john.knightley@gmail.com> wrote: > Dear Dan, > > thank you for your reply. > > The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on > a utf8 local > > A short 5 line dictionary file is sufficient to test:- > > raeuz > 我们 > 𦘭𥎵 > 𪽖𫖂 > > > line 1 "raeuz" Zhuang word written using English letters and show up > under ts_vector ok > line 2 "我们" uses everyday Chinese word and show up under ts_vector ok > line 3 "𦘭𥎵" Zhuang word written using rather old Chinese charcters > found in Unicode 3.1 which came in about the year 2000 and show up > under ts_vector ok > line 4 "𪽖𫖂" Zhuang word written using rather old Chinese charcters > found in Unicode 5.2 which came in about the year 2009 but do not show > up under ts_vector ok > line 5 "" Zhuang word written using rather old Chinese charcters > found in PUA area of the font Sawndip.ttf but do not show up under > ts_vector ok (Font can be downloaded from > http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf) > > The last two words even though included in a dictionary do not get > accepted by ts_vector. Hmm. Fedora 17 x86-64 w/ PostgreSQL 9.1.5 here, the latter seems to work using the default text search configuration (albeit with one crucial note: I created the database with the "lc_ctype=C lc_collate=C" options): WORKING: createdb --template=template0 --lc-ctype=C --lc-collate=C foobar foobar=# select ts_debug(''); ts_debug ----------------------------------------------------------------(word,"Word, all letters",,{english_stem},english_stem,{}) (1 row) NOT WORKING AS EXPECTED: foobaz=# SHOW LC_CTYPE; lc_ctype -------------en_US.UTF-8 (1 row) foobaz=# select ts_debug(''); ts_debug ---------------------------------(blank,"Space symbols",,{},,) (1 row) So... perhaps LC_CTYPE=C is a possible workaround for you?
В списке pgsql-hackers по дате отправления: