How are Unicode characters stored internally, in Postgres?
От | Kyung Lee |
---|---|
Тема | How are Unicode characters stored internally, in Postgres? |
Дата | |
Msg-id | 20030309044313.26775.qmail@web40307.mail.yahoo.com обсуждение исходный текст |
Ответы |
Re: How are Unicode characters stored internally, in
|
Список | pgsql-jdbc |
I have come across an interesting problem, that I hope someone can help me solve. PROBLEM: (short version) I can/have entered unicode characters (more specifically, Chinese characters) into a postgres db, in 2 "different" formats. One works for some applications, and one works for others. So, I would like some additional information as to how Chinese (or Unicode, in general) characters are stored internally in postgres. ENVIRONMENT: I have postgres 7.3.2. My database is encoded as UNICODE. Using java and jdbc3 driver. PROBLEM: (Extended Version) I have entered Chinese characters into a unicode-encoded postgres db in 2 different "ways". Let me explain. When I parse a file, containing Chinese characters, those characters go into the db one "way". When I use an HTML form to submit characters into the db, those characters go into the db a different "way". How do I know this? When I retrieve the characters, and try to display them in a browser, the first way (from a parsed file) just shows question marks, but the second way (from an HTML form) shows the characters correctly. When I use psql to view the way that was parsed by a file, it is not question marks, but looks like some sort of encoding. That encoding, is different from the encoding of the way submitted by the HTML form. Now ultimately, I am trying to display the Chinese characters in Flash. Flash has Unicode support, and assumes UTF-8 character encoding. Now, when I send the characters from the first way, it displays all the chinese characters correctly/perfectly. When I send the chinese characters from the second way, it only shows some of the characters, and the others are just not displayed at all. Let's forget about the Flash issue, it was just mentioned to point out the 2 different ways (I think) the Chinese characters are stored in postgres. So, this leads me to a few questions: 1. If I don't specify a client-encoding param, via an environment variable, or as a param on the postgres driver, what is the default, when the db is encoded as UNICODE? 2. I noticed something in the postgres documentation. In the section discussing Multibye Support (http://www.postgresql.org/docs/view.php?version=7.3&file=multibyte.html, Table 7-2), it shows UNICODE as an available client encoding, but not when the server is encoded as UNICODE. Why is that? Other server encodings have the same listed as client encodings (i.e. SQL_ASCII as a server encoding can have SQL_ASCII as the client encoding as well). Sorry for the long message, and thanks in advance for any help. __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/
В списке pgsql-jdbc по дате отправления: