Re: Q: Escapes in jsonpath Idents

Поиск
Список
Период
Сортировка
От David E. Wheeler
Тема Re: Q: Escapes in jsonpath Idents
Дата
Msg-id D1AA2802-98E2-4638-B9B0-512C40AC4B49@justatheory.com
обсуждение исходный текст
Ответ на Re: Q: Escapes in jsonpath Idents  (Erik Wienhold <ewie@ewie.name>)
Список pgsql-hackers
On Mar 17, 2024, at 20:09, Erik Wienhold <ewie@ewie.name> wrote:
>
> On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
>> On Mar 17, 2024, at 15:12, Erik Wienhold <ewie@ewie.name> wrote:
>>> So I think it makes sense to reword the entire backslash part of the
>>> paragraph and remove references to JSON entirely.  The attached patch
>>> does that and also formats the backslash escapes as a bulleted list for
>>> readability.
>>
>> Ah, it’s JavaScript format, not JSON! This does clarify things quite
>> nicely, thank you. Happy to add my review once it’s in a commit fest.
>
> Thanks.  https://commitfest.postgresql.org/48/4899/

Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc improvement welcome and much clearer than before.

> I had the same reasoning while writing my first reply but scrapped that
> part because I found it obvious:  That jsonpath is parsed before calling
> jsonb_path_exists() and therefore the parser has no context about any
> variables, which might not even be hardcoded but may result from a
> query.

Right, there’s a chicken/egg problem.

> Unfortunately, I don't have access to that part of the SQL spec.  So I
> don't know how the jsonpath grammar is specified.

Seems quite logical; I think it should be documented, but I’d also be interested to know what the 2016 and 2023
standardssay, exactly. 

> Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
> insights but haven't found any yet.

Everybody’s taking shortcuts relative to the standard, AFAICT. For example, jsonpath_scan.l matches unqouted
identifierswith these two regular expressions: 

<xnq>{other}+
<xnq>\/\*
<xnq,xq,xvq>\\.

Plus the backslash escapes. {other} is defined as:

/* "other" means anything that's not special, blank, or '\' or '"' */
other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]

Which is waaaay more liberal than the ECMA standard[1], by my reading, but the MSDN[2] description is quite succinct
(thanksfor the links!): 

> In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_), and dollar signs ($).
Identifiersare not allowed to start with numbers. However, JavaScript identifiers are not only limited to ASCII — many
Unicodecode points are allowed as well. Namely, any character in the ID_Start category can start an identifier, while
anycharacter in the ID_Continue category can appear after the first character. 


ID_Start[3] and ID_Continue[4] point to the unicode standard codes lister, nether of which reference Emoji. Sure
enough,in Safari: 

> x = {"🎉": true}
< {🎉: true}
> x.🎉
< SyntaxError: Invalid character '\ud83c’

But in Postgres jsonpath:

david=# select '$.🎉'::jsonpath;
 jsonpath
----------
 $."🎉"

If the MSDN references to ID_Start and ID_Continue are correct, then the Postgres path parser is being overly-liberal.
Maybethat’s totally fine? Not sure what should be documented and what’s not worth it. 

Aside: I’m only digging into these details because I’m busy porting the path parser, so trying to figure out where to
becompatible and where not to. So far I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the
unicodesupport in Go to specifically validate against ID_Start and ID_Continue. 

Best,

David

[1] https://262.ecma-international.org/#sec-identifier-names
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
[3] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D
[4] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D










В списке pgsql-hackers по дате отправления:

Предыдущее
От: Japin Li
Дата:
Сообщение: Re: Table AM Interface Enhancements
Следующее
От: stephane tachoires
Дата:
Сообщение: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands