Re: WIP Incremental JSON Parser
От | Andrew Dunstan |
---|---|
Тема | Re: WIP Incremental JSON Parser |
Дата | |
Msg-id | 8447d168-b981-2601-8ad0-53827fe18e5a@dunslane.net обсуждение исходный текст |
Ответ на | Re: WIP Incremental JSON Parser (Jacob Champion <jacob.champion@enterprisedb.com>) |
Ответы |
Re: WIP Incremental JSON Parser
|
Список | pgsql-hackers |
On 2024-04-02 Tu 15:38, Jacob Champion wrote: > On Mon, Apr 1, 2024 at 4:53 PM Andrew Dunstan <andrew@dunslane.net> wrote: >> Anyway, here are new patches. I've rolled the new semantic test into the >> first patch. > Looks good! I've marked RfC. Thanks! I appreciate all the work you've done on this. I will give it one more pass and commit RSN. > >> json_lex() is not really a very hot piece of code. > Sure, but I figure if someone is trying to get the performance of the > incremental parser to match the recursive one, so we can eventually > replace it, it might get a little warmer. :) I don't think this is where the performance penalty lies. Rather, I suspect it's the stack operations in the non-recursive parser itself. The speed test doesn't involve any partial token processing at all, and yet the non-recursive parser is significantly slower in that test. >>> I think it'd be good for a v1.x of this feature to focus on >>> simplification of the code, and hopefully consolidate and/or eliminate >>> some of the duplicated parsing work so that the mental model isn't >>> quite so big. >> I'm not sure how you think that can be done. > I think we'd need to teach the lower levels of the lexer about > incremental parsing too, so that we don't have two separate sources of > truth about what ends a token. Bonus points if we could keep the parse > state across chunks to the extent that we didn't need to restart at > the beginning of the token every time. (Our current tools for this are > kind of poor, like the restartable state machine in PQconnectPoll. > While I'm dreaming, I'd like coroutines.) Now, whether the end result > would be more or less maintainable is left as an exercise... > I tried to disturb the main lexer processing as little as possible. We could possibly unify the two paths, but I have a strong suspicion that would result in a performance hit (the main part of the lexer doesn't copy any characters at all, it just keeps track of pointers into the input). And while the code might not undergo lots of change, the routine itself is quite performance critical. Anyway, I think that's all something for another day. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: