Second byte of multibyte characters causing trouble
От | Karen Ellrick |
---|---|
Тема | Second byte of multibyte characters causing trouble |
Дата | |
Msg-id | GAELLCMOCEGMDMHDMIILOEMBCNAA.k-ellrick@sctech.co.jp обсуждение исходный текст |
Ответы |
Re: Second byte of multibyte characters causing trouble
|
Список | pgsql-general |
I am using Perl CGI scripts with DBI to take data from a web interface and from text files to put into my database, and I'm dealing with Japanese (i.e. two-byte characters). PostgreSQL is installed with multibyte enabled, but somewhere in the communication chain from Perl to DBI to PostgreSQL, something is trying to interpret multibyte text byte by byte, which is causing trouble. The example that has been discovered so far is that if the second of the two bytes is 0x5c (in ASCII, "\"), it gets swallowed and a ripple effect of byte pairs ensues (at least if the byte after the 0x5c isn't a valid character to follow \ to make a metacharacter - if it is, who knows what will happen!). I fixed that one by replacing any \ in the strings with "\\" to get a literal 0x5C byte past whatever is trying to interpret it. But I am wondering what other similar pitfalls I have to watch out for, and I'm hoping others have ideas. For example, is my SQL insert or update statement going to choke if the second byte of one of the characters is the same as ASCII for a single quote? The possibilities are endless, depending on what part of the process is doing the damage. And trying to test this stuff is like looking for a needle in a haystack - it's not easy to figure out what Japanese characters have second bytes that would have special meaning if interpreted as ASCII. If someone knows how to set things up so that all text is guaranteed to go through unscathed (make Perl or DBI multi-byte aware, or whatever - i.e. the real fix), that would be ideal. Otherwise, at least some ideas would be welcome regarding what other bytes to write bandaid code for. I know I'm not the only one trying to use Perl to maintain PostgreSQL databases with Japanese or Chinese text! :-) Thanks in advance, Karen -------------------------------- Karen Ellrick S & C Technology, Inc. 1-21-35 Kusatsu-shinmachi Hiroshima 733-0834 Japan (from U.S. 011-81, from Japan 0) 82-293-2838 --------------------------------
В списке pgsql-general по дате отправления: