Re: JSON for PG 9.2
От | Andrew Dunstan |
---|---|
Тема | Re: JSON for PG 9.2 |
Дата | |
Msg-id | 4F19A5B4.2020902@dunslane.net обсуждение исходный текст |
Ответ на | Re: JSON for PG 9.2 (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 01/20/2012 11:58 AM, Robert Haas wrote: > On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan<andrew@dunslane.net> wrote: >> XML'snnnn; escape mechanism is more or less the equivalent of JSON's >> \unnnn. But XML documents can be encoded in a variety of encodings, >> including non-unicode encodings such as Latin-1. However, no matter what the >> document encoding,nnnn; designates the character with Unicode code point >> nnnn, whether or not that is part of the document encoding's charset. > OK. > >> Given that precedent, I'm wondering if we do need to enforce anything other >> than that it is a valid unicode code point. >> >> Equivalence comparison is going to be difficult anyway if you're not >> resolving all \unnnn escapes. Possibly we need some sort of canonicalization >> function to apply for comparison purposes. But we're not providing any >> comparison ops today anyway, so I don't think we need to make that decision >> now. As you say, there doesn't seem to be any defined canonical form - the >> spec is a bit light on in this respect. > Well, we clearly have to resolve all \uXXXX to do either comparison or > canonicalization. The current patch does neither, but presumably we > want to leave the door open to such things. If we're using UTF-8 and > comparing two strings, and we get to a position where one of them has > a character and the other has \uXXXX, it's pretty simple to do the > comparison: we just turn XXXX into a wchar_t and test for equality. > That should be trivial, unless I'm misunderstanding. If, however, > we're not using UTF-8, we have to first turn \uXXXX into a Unicode > code point, then covert that to a character in the database encoding, > and then test for equality with the other character after that. I'm > not sure whether that's possible in general, how to do it, or how > efficient it is. Can you or anyone shed any light on that topic? We know perfectly well how to turn two strings from encoding x to utf8 (see mb_utils.c::pg_do_encoding_conversion() ). Once we've done that ISTM we have reduced this to the previous problem, as the mathematicians like to say. cheers andrew
В списке pgsql-hackers по дате отправления: