Re: Extending range of to_tsvector et al
От | Tom Lane |
---|---|
Тема | Re: Extending range of to_tsvector et al |
Дата | |
Msg-id | 28864.1349064678@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Extending range of to_tsvector et al (john knightley <john.knightley@gmail.com>) |
Ответы |
Re: Extending range of to_tsvector et al
|
Список | pgsql-hackers |
john knightley <john.knightley@gmail.com> writes: > The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on > a utf8 local > A short 5 line dictionary file is sufficient to test:- > raeuz > 我们 > 𦘭𥎵 > 𪽖𫖂 > > line 1 "raeuz" Zhuang word written using English letters and show up > under ts_vector ok > line 2 "我们" uses everyday Chinese word and show up under ts_vector ok > line 3 "𦘭𥎵" Zhuang word written using rather old Chinese charcters > found in Unicode 3.1 which came in about the year 2000 and show up > under ts_vector ok > line 4 "𪽖𫖂" Zhuang word written using rather old Chinese charcters > found in Unicode 5.2 which came in about the year 2009 but do not show > up under ts_vector ok > line 5 "" Zhuang word written using rather old Chinese charcters > found in PUA area of the font Sawndip.ttf but do not show up under > ts_vector ok (Font can be downloaded from > http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf) AFAIK there is nothing in Postgres itself that would distinguish, say, 𦘭 from 𪽖. I think this must be down to your platform's locale definition: it probably thinks that the former is a letter and the latter is not. You'd have to gripe to the locale maintainers to get that fixed. regards, tom lane
В списке pgsql-hackers по дате отправления: