Re: Proposal: Add JSON support
От | Mike Rylander |
---|---|
Тема | Re: Proposal: Add JSON support |
Дата | |
Msg-id | b918cf3d1003281723q55a028fak545c71d459a25ef4@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposal: Add JSON support (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Proposal: Add JSON support
|
Список | pgsql-hackers |
On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Here's another thought. Given that JSON is actually specified to consist >> of a string of Unicode characters, what will we deliver to the client >> where the client encoding is, say Latin1? Will it actually be a legal >> JSON byte stream? > > No, it won't. We will *not* be sending anything but latin1 in such a > situation, and I really couldn't care less what the JSON spec says about > it. Delivering wrongly-encoded data to a client is a good recipe for > all sorts of problems, since the client-side code is very unlikely to be > expecting that. A datatype doesn't get to make up its own mind whether > to obey those rules. Likewise, data on input had better match > client_encoding, because it's otherwise going to fail the encoding > checks long before a json datatype could have any say in the matter. > > While I've not read the spec, I wonder exactly what "consist of a string > of Unicode characters" should actually be taken to mean. Perhaps it > only means that all the characters must be members of the Unicode set, > not that the string can never be represented in any other encoding. > There's more than one Unicode encoding anyway... In practice, every parser/serializer I've used (including the one I helped write) allows (and, often, forces) any non-ASCII character to be encoded as \u followed by a string of four hex digits. Whether it would be easy inside the backend, when generating JSON from user data stored in tables that are not in a UTF-8 encoded cluster, to convert to UTF-8, that's something else entirely. If it /is/ easy and safe, then it's just a matter of scanning for multi-byte sequences and replacing those with their \uXXXX equivalents. I have some simple and fast code I could share, if it's needed, though I suspect it's not. :) UPDATE: Thanks, Robert, for pointing to the RFC. -- Mike Rylander| VP, Research and Design| Equinox Software, Inc. / The Evergreen Experts| phone: 1-877-OPEN-ILS (673-6457)|email: miker@esilibrary.com| web: http://www.esilibrary.com
В списке pgsql-hackers по дате отправления: