Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)

Поиск

Список

Период

Сортировка

От	t-ishii@sra.co.jp (Tatsuo Ishii)
Тема	Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)
Дата	23 ноября 1998 г. 09:29:32
Msg-id	199811231429.XAA10436@meshsv26.tk.mesh.ad.jp обсуждение исходный текст
Ответ на	Questions on using multi-byte character in a field of a table (BIG5) ("Hui Chun Kit, Jacky" <ckhui@school.net.hk>)
Список	pgsql-hackers

Дерево обсуждения

At 3:46 AM 98.11.22 +0800, Hui Chun Kit, Jacky wrote:
>Dear all,
>
>    I have some difficult time in using postgresql 6.4 with chinese BIG5
>
>characters. I am just looking for storing BIG characters in a text field
>
>and retrieve correctly. I have --enable-mb when I compile. I am on RH5.1

What did you choose for an encoding?
BIG5 is not supported yet in 6.4, sorry.

>intel platform, running PG 6.4.
>    I just created a testing table test
>    create test ( name char(20), age int);
>    For most of the characters in BIG5, it works and I can insert
>chinese name into the table, but for some characters, esp my own name,
>it does not work. I have check the problem out . But cannot solve it.
>    It is because in my name under BIG5 coding it is "5cb3 54ab c7b3"
>or
>in ASCII code "263   \ 253   T 263 307" where two byte is a character.
>That is "5cb3" ('263' '\' ) is the first character and '54ab' ( '253'
>'T' ) becomes the second character. The problem is that somewhere
>between storing the value into database and client frontend (Perl,
>MSAccess) , the '\' is interpreted and thus the stored value becomes
>"263  253   T 263 307" which is distorted.
>    I don't know where exactly is the problem as when I use Mysql, it is
>
>working fine.

As you can see the problem is that BIG5 can contain some special characters
in the second byte that confuse the PostgreSQL parser. We had similar
experience with Japanese Shift Jis Code (SJIS). To address the problem
we have added a fuctionality to convert between SJIS and EUC_JP (that never
confuses the parser thus can be used as one of backend native encoding)
somewhere in the backend.

To solve your problem, there might be 2 solutions:

o Use EUC_TW(Chinese EUC Code) instead of BIG5. 6.4 should be happy with EUC_TW. To use EUC_TW, just create a new
database:     createdb mydb with encoding='EUC_TW'. or do "configure --with-mb=EUC_TW" and re-install. then re-create
thedatabase.

 Alternatively, you can use Unicode (UTF-8). Use "UNICODE" instead of "EUC_TW" in this case.

o Add an encoding conversion module between BIG5 and EUC_TW to PostgreSQL.I wish I could do that, but I have no idea
howto write it (I don't speak Chinese at all). So your contribution would be welcome!

BTW, you said you use perl. I'm surprised to hear that perl
can handle BIG5. Is it a modified version (localized version)?

You also use M$Access. So you must use ODBC, that make me worry about its
support for BIG5. Here in Japan we are using localized version of
ODBC driver that supports SJIS.

What I want to say here is that your problem may not be ony PostgreSQL
itself. I recommend you make sure that your clients can handle
BIG5.
--
Tatsuo Ishii
t-ishii@sra.co.jp

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)