Re: JSON and unicode surrogate pairs
От | Andrew Dunstan |
---|---|
Тема | Re: JSON and unicode surrogate pairs |
Дата | |
Msg-id | 51B76825.2020803@dunslane.net обсуждение исходный текст |
Ответ на | Re: JSON and unicode surrogate pairs (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: JSON and unicode surrogate pairs
Re: JSON and unicode surrogate pairs |
Список | pgsql-hackers |
On 06/10/2013 11:22 PM, Noah Misch wrote: > On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote: >> On 06/10/2013 10:18 AM, Tom Lane wrote: >>> Andrew Dunstan <andrew@dunslane.net> writes: >>>> After thinking about this some more I have come to the conclusion that >>>> we should only do any de-escaping of \uxxxx sequences, whether or not >>>> they are for BMP characters, when the server encoding is utf8. For any >>>> other encoding, which is already a violation of the JSON standard >>>> anyway, and should be avoided if you're dealing with JSON, we should >>>> just pass them through even in text output. This will be a simple and >>>> very localized fix. >>> Hmm. I'm not sure that users will like this definition --- it will seem >>> pretty arbitrary to them that conversion of \u sequences happens in some >>> databases and not others. > Yep. Suppose you have a LATIN1 database. Changing it to a UTF8 database > where everyone uses client_encoding = LATIN1 should not change the semantics > of successful SQL statements. Some statements that fail with one database > encoding will succeed in the other, but a user should not witness a changed > non-error result. (Except functions like decode() that explicitly expose byte > representations.) Having "SELECT '["\u00e4"]'::json ->> 0" emit 'ä' in the > UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in the > wrong direction relative to that ideal. > >> Then what should we do when there is no matching codepoint in the >> database encoding? First we'll have to delay the evaluation so it's not >> done over-eagerly, and then we'll have to try the conversion and throw >> an error if it doesn't work. The second part is what's happening now, >> but the delayed evaluation is not. > +1 for doing it that way. > As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. cheers andrew
В списке pgsql-hackers по дате отправления: