Encoding problems in PostgreSQL with XML data
От | Peter Eisentraut |
---|---|
Тема | Encoding problems in PostgreSQL with XML data |
Дата | |
Msg-id | 200401091946.01930.peter_e@gmx.net обсуждение исходный текст |
Ответы |
Re: Encoding problems in PostgreSQL with XML data
|
Список | pgsql-hackers |
This is not directly related to current development, but it is something that might need a low-level solution. I've been thinking for some time about how to enchance the current "XML support" (e.g., contrib/xml). The central problem I have is this: How do we deal with the fact that an XML datum carries its own encoding information? Here's a scenario: It is desirable to have validity checking on XML input, be it a special XML data type or some functions that take XML data. Say we define a data type that stores XML documents and rejects documents that are not well-formed. I want to insert something in psql: CREATE TABLE test ( description text, content xml ); INSERT INTO test VALUES ('test document', '<?xml version="1.0"?><doc><para>blah</para>...</doc>'); Now an XML parser will assume this document to be in UTF-8, and say at the client it is. What if client_encoding=UNICODE but server_encoding=LATIN1? Do we expect some layer to rewrite the <?xml?> declaration to contain the correct encoding information? Or can the xml type bypass encoding conversion? What about reading it back out of the database with yet another client encoding? Rewriting the <?xml?> declaration seems like a workable solution, but it would break the transparency of the client/server encoding conversion. Also, some people might dislike that their documents are being changed as they are stored. Any ideas?
В списке pgsql-hackers по дате отправления: