Re: WIP Incremental JSON Parser

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: WIP Incremental JSON Parser
Дата
Msg-id CAD5tBcLjFnTnAvUpQg-K5sJa3ECnEj8CQZ-AYrH1k+F=42D+Vg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP Incremental JSON Parser  (Jacob Champion <jacob.champion@enterprisedb.com>)
Ответы Re: WIP Incremental JSON Parser  (Jacob Champion <jacob.champion@enterprisedb.com>)
Список pgsql-hackers


On Thu, Mar 14, 2024 at 3:35 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote:
I've been poking at the partial token logic. The json_errdetail() bug
mentioned upthread (e.g. for an invalid input `[12zz]` and small chunk
size) seems to be due to the disconnection between the "main" lex
instance and the dummy_lex that's created from it. The dummy_lex
contains all the information about the failed token, which is
discarded upon an error return:

> partial_result = json_lex(&dummy_lex);
> if (partial_result != JSON_SUCCESS)
>     return partial_result;

In these situations, there's an additional logical error:
lex->token_start is pointing to a spot in the string after
lex->token_terminator, which breaks an invariant that will mess up
later pointer math. Nothing appears to be setting lex->token_start to
point into the partial token buffer until _after_ the partial token is
successfully lexed, which doesn't seem right -- in addition to the
pointer math problems, if a previous chunk was freed (or on a stale
stack frame), lex->token_start will still be pointing off into space.
Similarly, wherever we set token_terminator, we need to know that
token_start is pointing into the same buffer.

Determining the end of a token is now done in two separate places
between the partial- and full-lexer code paths, which is giving me a
little heartburn. I'm concerned that those could drift apart, and if
the two disagree on where to end a token, we could lose data into the
partial token buffer in a way that would be really hard to debug. Is
there a way to combine them?


Not very easily. But I think and hope I've fixed the issue you've identified above about returning before lex->token_start is properly set.

 Attached is a new set of patches that does that and is updated for the json_errdetaiil() changes.

cheers

andrew
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: Adding OLD/NEW support to RETURNING