Re: JSON and unicode surrogate pairs
От | Robert Haas |
---|---|
Тема | Re: JSON and unicode surrogate pairs |
Дата | |
Msg-id | CA+TgmoapNgKpPiwVyR=wxCj=1m9RqL3311gA6fibbXijMv=rtg@mail.gmail.com обсуждение исходный текст |
Ответ на | JSON and unicode surrogate pairs (Andrew Dunstan <andrew@dunslane.net>) |
Ответы |
Re: JSON and unicode surrogate pairs
|
Список | pgsql-hackers |
On Wed, Jun 5, 2013 at 10:46 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > In 9.2, the JSON parser didn't check the validity of the use of unicode > escapes other than that it required 4 hex digits to follow '\u'. In 9.3, > that is still the case. However, the JSON accessor functions and operators > also try to turn JSON strings into text in the server encoding, and this > includes de-escaping \u sequences. This works fine except when there is a > pair of sequences representing a UTF-16 type surrogate pair, something that > is explicitly permitted in the JSON spec. > > The attached patch is an attempt to remedy that, and a surrogate pair is > turned into the correct code point before converting it to whatever the > server encoding is. > > Note that this would mean we can still put JSON with incorrect use of > surrogates into the database, as now (9.2 and later), and they will cause > almost all the accessor functions to raise an error, as now (9.3). All this > does is allow JSON that uses surrogates correctly not to fail when applying > the accessor functions and operators. That's a possible violation of POLA, > and at least worth of a note in the docs, but I'm not sure what else we can > do now - adding this check to the input lexer would possibly cause restores > to fail, which users might not thank us for. I think the approach you've proposed here is a good one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: