Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5)
От | t-ishii@sra.co.jp (Tatsuo Ishii) |
---|---|
Тема | Re: [HACKERS] Questions on using multi-byte character in a field of a table (BIG5) |
Дата | |
Msg-id | 199811231429.XAA10436@meshsv26.tk.mesh.ad.jp обсуждение исходный текст |
Ответ на | Questions on using multi-byte character in a field of a table (BIG5) ("Hui Chun Kit, Jacky" <ckhui@school.net.hk>) |
Список | pgsql-hackers |
At 3:46 AM 98.11.22 +0800, Hui Chun Kit, Jacky wrote: >Dear all, > > I have some difficult time in using postgresql 6.4 with chinese BIG5 > >characters. I am just looking for storing BIG characters in a text field > >and retrieve correctly. I have --enable-mb when I compile. I am on RH5.1 What did you choose for an encoding? BIG5 is not supported yet in 6.4, sorry. >intel platform, running PG 6.4. > I just created a testing table test > create test ( name char(20), age int); > For most of the characters in BIG5, it works and I can insert >chinese name into the table, but for some characters, esp my own name, >it does not work. I have check the problem out . But cannot solve it. > It is because in my name under BIG5 coding it is "5cb3 54ab c7b3" >or >in ASCII code "263 \ 253 T 263 307" where two byte is a character. >That is "5cb3" ('263' '\' ) is the first character and '54ab' ( '253' >'T' ) becomes the second character. The problem is that somewhere >between storing the value into database and client frontend (Perl, >MSAccess) , the '\' is interpreted and thus the stored value becomes >"263 253 T 263 307" which is distorted. > I don't know where exactly is the problem as when I use Mysql, it is > >working fine. As you can see the problem is that BIG5 can contain some special characters in the second byte that confuse the PostgreSQL parser. We had similar experience with Japanese Shift Jis Code (SJIS). To address the problem we have added a fuctionality to convert between SJIS and EUC_JP (that never confuses the parser thus can be used as one of backend native encoding) somewhere in the backend. To solve your problem, there might be 2 solutions: o Use EUC_TW(Chinese EUC Code) instead of BIG5. 6.4 should be happy with EUC_TW. To use EUC_TW, just create a new database: createdb mydb with encoding='EUC_TW'. or do "configure --with-mb=EUC_TW" and re-install. then re-create thedatabase. Alternatively, you can use Unicode (UTF-8). Use "UNICODE" instead of "EUC_TW" in this case. o Add an encoding conversion module between BIG5 and EUC_TW to PostgreSQL.I wish I could do that, but I have no idea howto write it (I don't speak Chinese at all). So your contribution would be welcome! BTW, you said you use perl. I'm surprised to hear that perl can handle BIG5. Is it a modified version (localized version)? You also use M$Access. So you must use ODBC, that make me worry about its support for BIG5. Here in Japan we are using localized version of ODBC driver that supports SJIS. What I want to say here is that your problem may not be ony PostgreSQL itself. I recommend you make sure that your clients can handle BIG5. -- Tatsuo Ishii t-ishii@sra.co.jp
В списке pgsql-hackers по дате отправления: