JSON and unicode surrogate pairs
От | Andrew Dunstan |
---|---|
Тема | JSON and unicode surrogate pairs |
Дата | |
Msg-id | 51AF4F37.107@dunslane.net обсуждение исходный текст |
Ответы |
Re: JSON and unicode surrogate pairs
|
Список | pgsql-hackers |
In 9.2, the JSON parser didn't check the validity of the use of unicode escapes other than that it required 4 hex digits to follow '\u'. In 9.3, that is still the case. However, the JSON accessor functions and operators also try to turn JSON strings into text in the server encoding, and this includes de-escaping \u sequences. This works fine except when there is a pair of sequences representing a UTF-16 type surrogate pair, something that is explicitly permitted in the JSON spec. The attached patch is an attempt to remedy that, and a surrogate pair is turned into the correct code point before converting it to whatever the server encoding is. Note that this would mean we can still put JSON with incorrect use of surrogates into the database, as now (9.2 and later), and they will cause almost all the accessor functions to raise an error, as now (9.3). All this does is allow JSON that uses surrogates correctly not to fail when applying the accessor functions and operators. That's a possible violation of POLA, and at least worth of a note in the docs, but I'm not sure what else we can do now - adding this check to the input lexer would possibly cause restores to fail, which users might not thank us for. cheers andrew
Вложения
В списке pgsql-hackers по дате отправления: