Re: [HACKERS] UTF-8 data migration problem in Postgresql 7.2
От | Tatsuo Ishii |
---|---|
Тема | Re: [HACKERS] UTF-8 data migration problem in Postgresql 7.2 |
Дата | |
Msg-id | 20020221183154Q.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] UTF-8 data migration problem in Postgresql 7.2 (Jean-Michel POURE <jm.poure@freesurf.fr>) |
Список | pgsql-odbc |
> > o Were server/clien encodings UTF-8 for PostgreSQL? > Yes. > > > o What are versions of these softwares? Especially of PHP? Is it a > > PHP4? if so, what version? What is the "Php with UTF-8 extensions"? > > I've never heard of it. > It is PHP 4.0.6 with : > --enable-mbstring : Enable mbstring functions. This option is required to use > mbstring functions. > --enable-mbstr-enc-trans : Enable HTTP input character encoding conversion > using mbstring conversion engine. If this feature is enabled, HTTP input > character encoding may be converted to mbstring.internal_encoding > automatically. Oh, that's a general functionality for handling multibyte characters, not only for UTF-8. What are settings for mbstring in php.ini? (entries begin with "mbstring.") BTW, PHP4.0.6 is very buggy when used with PostgreSQL (random crashes). I recomend to upgrade to 4.1.1. > Now, some more information: > 1) Dutch text was entered using IE5.5. It is not faulty. I assume the web page's encoding was UTF-8. > 2) Japanese text was entered using OpenOffice latest release (sorry, I said > IE5 but I was wrong), saved under UTF-8 and imported in PostgreSQL. Only > Japanese data has problems. Can I take a look at the UTF-8 text generated by OpenOffice? > 3) When opening a faulty Japanese record using Apache/IE5, the record is > displayed correctly. Each faulty character is replaced by a Japanese 30A7 > gryph (looks like a French cross with two horizontal lines). What is this > gryph? Does it mean 'I don't know' in Japanese. What do you mean by "gryph"? Is 30A7 is an EUC-JP? > The record is saved correctly using this 30A1 gryph (then it looks like it is > fixed as I can dump it and import it in 7.2, but this is not a solution). Again, what is "gryph"? > 4) In PostgreSQL 7.1.3 original dump, there is only one faulty UTF-8 > character repeated 700 times. If you open my file in Yudit, it is displayed > as =E3=82' Why is it always the same character everywhere? Maybe you could > have a look at my source file again. Sounds like a bug (Open Office or > PostgreSQL). > > 5) Surrogate pairs > I heard PostgreSQL did not support surrogate pairs. Is this a problem of > surrogate pair? Just my 0.02 cents, I know very little about UTF-8. I don't think so. -- Tatsuo Ishii
В списке pgsql-odbc по дате отправления: