Re: Cleaning up array_in()
От | Heikki Linnakangas |
---|---|
Тема | Re: Cleaning up array_in() |
Дата | |
Msg-id | 5859ce4e-2be4-92b0-c85c-e1e24eab57c6@iki.fi обсуждение исходный текст |
Ответ на | Re: Cleaning up array_in() (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Cleaning up array_in()
Re: Cleaning up array_in() |
Список | pgsql-hackers |
On 08/07/2023 19:08, Tom Lane wrote: > I wrote: >> So I end up with the attached. I went ahead and dropped >> ArrayGetOffset0() as part of 0001, and I split 0002 into two patches >> where the new 0002 avoids re-indenting any existing code in order >> to ease review, and then 0003 is just a mechanical application >> of pgindent. > > That got sideswiped by ae6d06f09, so here's a trivial rebase to > pacify the cfbot. > > #text/x-diff; name="v3-0001-Simplify-and-speed-up-ReadArrayStr.patch" [v3-0001-Simplify-and-speed-up-ReadArrayStr.patch]/home/tgl/pgsql/v3-0001-Simplify-and-speed-up-ReadArrayStr.patch > #text/x-diff; name="v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch" [v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch] /home/tgl/pgsql/v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch > #text/x-diff; name="v3-0003-Re-indent-ArrayCount.patch" [v3-0003-Re-indent-ArrayCount.patch] /home/tgl/pgsql/v3-0003-Re-indent-ArrayCount.patch Something's wrong with your attachments. Hmm, I wonder if ae6d06f09 had a negative performance impact. In an unquoted array element, scanner_isspace() function is called for every character, so it might be worth inlining. On the patches: They are a clear improvement, thanks for that. That said, I still find the logic very hard to follow, and there are some obvious performance optimizations that could be made. ArrayCount() interprets low-level quoting and escaping, and tracks the dimensions at the same time. The state machine is pretty complicated. And when you've finally finished reading and grokking that function, you see that ReadArrayStr() repeats most of the same logic. Ugh. I spent some time today refactoring it for readability and speed. I introduced a separate helper function to tokenize the input. It deals with whitespace, escapes, and backslashes. Then I merged ArrayCount() and ReadArrayStr() into one function that parses the elements and determines the dimensions in one pass. That speeds up parsing large arrays. With the tokenizer function, the logic in ReadArrayStr() is still quite readable, even though it's now checking the dimensions at the same time. I also noticed that we used atoi() to parse the integers in the dimensions, which doesn't do much error checking. Some funny cases were accepted because of that, for example: postgres=# select '[1+-+-+-+-+-+:2]={foo,bar}'::text[]; text ----------- {foo,bar} (1 row) I tightened that up in the passing. Attached are your patches, rebased to fix the conflicts with ae6d06f09 like you intended. On top of that, my patches. My patches need more testing, benchmarking, and review, so if we want to sneak something into v16, better go with just your patches. If we're tightening up the accepted inputs, maybe fix that atoi() sloppiness, though. -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
В списке pgsql-hackers по дате отправления: