Re: [HACKERS] possible encoding issues with libxml2 functions
От | Pavel Stehule |
---|---|
Тема | Re: [HACKERS] possible encoding issues with libxml2 functions |
Дата | |
Msg-id | CAFj8pRCA0GWCLdwsPMFS2tgw+Fy3WQkSFzC7T-5xj+5hcyQs_w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] possible encoding issues with libxml2 functions (Pavel Stehule <pavel.stehule@gmail.com>) |
Список | pgsql-hackers |
2017-03-12 22:26 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2017-03-12 21:57 GMT+01:00 Noah Misch <noah@leadboat.com>:On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> 2017-03-12 0:56 GMT+01:00 Noah Misch <noah@leadboat.com>:
> > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > > There are possible two fixes
> > >
> > > a) clean decl on input - the encoding info can be removed from decl part
> > >
> > > b) use xml_out_internal everywhere before transformation to
> > > xmlChar. pg_xmlCharStrndup can be good candidate.
> >
> > I'd prefer (a) if the xml type were a new feature, because no good can
> > come of
> > storing an encoding in each xml field when we know the actual encoding is
> > the
> > database encoding. However, if you implemented (a), we'd still see
> > untreated
> > values brought over via pg_upgrade. Therefore, I would try (b) first. I
> > suspect the intent of xml_parse() was to implement (b); it will be
> > interesting
> > to see your test case that malfunctions.
> >
>
> I looked there again and I found so this issue is related to xpath function
> only
>
> Functions based on xml_parse are working without problems. xpath_internal
> uses own direct xmlCtxtReadMemory without correct encoding sanitation.
>
> so fix is pretty simple
Please add a test case.It needs a application - currently there is not possibility to import XML document via recv API :(I wrote a pgimportdoc utility, but it is not part of core
> --- a/src/backend/utils/adt/xml.c
> +++ b/src/backend/utils/adt/xml.c
> @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
> ArrayType *namespaces,
> ns_count = 0;
> }
>
> - datastr = VARDATA(data);
> - len = VARSIZE(data) - VARHDRSZ;
> + datastr = xml_out_internal(data, 0);
Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly? The
answer is probably in the archives, because someone understood the problem
enough to document "Some XML-related functions may not work at all on
non-ASCII data when the server encoding is not UTF-8. This is known to be an
issue for xpath() in particular."Probably there are two possible issues1. what I touched - recv function does encoding to database encoding - but document encoding is not updated.2. there are not possibility to encode from document encoding to database encoding.
> + len = strlen(datastr);
> +
> xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
> +
The two lines of functional change don't create a cause for more newlines, so
don't add these two newlines.ok
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.cindex 1908b13db5..2786d5b1cb 100644--- a/src/backend/utils/adt/xml.c+++ b/src/backend/utils/adt/xml.c@@ -3874,8 +3874,8 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,ns_count = 0;}- datastr = VARDATA(data);- len = VARSIZE(data) - VARHDRSZ;+ datastr = xml_out_internal(data, 0);+ len = strlen(datastr);xpath_len = VARSIZE_ANY_EXHDR(xpath_expr_text);if (xpath_len == 0)ereport(ERROR,
Вложения
В списке pgsql-hackers по дате отправления: