Re: speed up verifying UTF-8
От | John Naylor |
---|---|
Тема | Re: speed up verifying UTF-8 |
Дата | |
Msg-id | CAFBsxsHUgNeytyF6TyoUBgf8whqRxvStbWtok9qcDJzDZ78FLw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: speed up verifying UTF-8 (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>) |
Ответы |
Re: speed up verifying UTF-8
Re: speed up verifying UTF-8 |
Список | pgsql-hackers |
I've decided I'm not quite comfortable with the additional complexity in the build system introduced by the SIMD portion of the previous patches. It would make more sense if the pure C portion were unchanged, but with the shift-based DFA plus the bitwise ASCII check, we have a portable implementation that's still a substantial improvement over the current validator. In v24, I've included only that much, and the diff is only about 1/3 as many lines. If future improvements to COPY FROM put additional pressure on this path, we can always add SIMD support later.
One thing not in this patch is a possible improvement to pg_utf8_verifychar() that Heikki and I worked on upthread as part of earlier attempts to rewrite pg_utf8_verifystr(). That's worth looking into separately.
On Thu, Aug 26, 2021 at 12:09 PM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>
> >Attached is v23 incorporating the 32-bit transition table, with the necessary comment adjustments
>
> 32bit table is nice.
Thanks for taking a look!
> Would you please replace https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL with
> https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp
> in the header of src/port/pg_utf8_fallback.c?
>
> It would make the URL more stable in case the file gets renamed.
>
> Vladimir
>
Makes sense, so done that way.
--
John Naylor
EDB: http://www.enterprisedb.com
One thing not in this patch is a possible improvement to pg_utf8_verifychar() that Heikki and I worked on upthread as part of earlier attempts to rewrite pg_utf8_verifystr(). That's worth looking into separately.
On Thu, Aug 26, 2021 at 12:09 PM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>
> >Attached is v23 incorporating the 32-bit transition table, with the necessary comment adjustments
>
> 32bit table is nice.
Thanks for taking a look!
> Would you please replace https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL with
> https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp
> in the header of src/port/pg_utf8_fallback.c?
>
> It would make the URL more stable in case the file gets renamed.
>
> Vladimir
>
Makes sense, so done that way.
--
John Naylor
EDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: