Обсуждение: Fwd: Question about xmloption and pg_restore

Поиск
Список
Период
Сортировка

Fwd: Question about xmloption and pg_restore

От
Stefan Fercot
Дата:
Hi all,


I've got some question about XML OPTION and pg_restore.

In a xml type column are stored documents.

When restoring a plain SQL dump, we got the message :

ERROR:  invalid XML content
DETAIL:  line 1: StartTag: invalid element name
<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
 ^
CONTEXT:  COPY layer_styles, line 1, column styleqml: "<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>


Adding "SET XML OPTION DOCUMENT;" in top of the dump file allows to restore it.

Now, with the "custom" format, we have to use before pg_restore : export PGOPTIONS="-c xmloption=DOCUMENT".

Do you think of any other way to solve this issue ?

What if we got the 2 xml options used in the database?

Would it be possible to have something like : ALTER TABLE ... ALTER COLUMN ... SET XML OPTION ...; ?


Kind regards,

-- 
Stefan FERCOT
http://dalibo.com - http://dalibo.org
Вложения

Re: Question about xmloption and pg_restore

От
Robert Haas
Дата:
On Thu, May 17, 2018 at 9:37 AM, Stefan Fercot <stefan.fercot@dalibo.com> wrote:
> When restoring a plain SQL dump, we got the message :
>
> ERROR:  invalid XML content
> DETAIL:  line 1: StartTag: invalid element name
> <!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
>  ^
> CONTEXT:  COPY layer_styles, line 1, column styleqml: "<!DOCTYPE qgis PUBLIC
> 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
>
> Adding "SET XML OPTION DOCUMENT;" in top of the dump file allows to restore
> it.

Hmm.  I thought that xmloption = 'content' was supposed to be strictly
more permissive than xmloption = 'document'.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Re: Question about xmloption and pg_restore

От
Chapman Flack
Дата:
On 05/18/18 15:50, Robert Haas wrote:
> On Thu, May 17, 2018 at 9:37 AM, Stefan Fercot <stefan.fercot@dalibo.com> wrote:
>> ERROR:  invalid XML content
>> DETAIL:  line 1: StartTag: invalid element name
>> <!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
>>  ^
>> CONTEXT:  COPY layer_styles, line 1, column styleqml: "<!DOCTYPE qgis PUBLIC
>> 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
>>
>> Adding "SET XML OPTION DOCUMENT;" in top of the dump file allows to restore
>> it.
> 
> Hmm.  I thought that xmloption = 'content' was supposed to be strictly
> more permissive than xmloption = 'document'.

In the spirit of not leaving a good question hanging, this turns out to be
a difference between the 2003 SQL/XML standard (which PG implements) and
the later versions, which changed the data model so there really is a
containment relationship between 'content' and 'document'.

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION

-Chap


Re: Question about xmloption and pg_restore

От
Tom Lane
Дата:
Chapman Flack <chap@anastigmatix.net> writes:
> On 05/18/18 15:50, Robert Haas wrote:
>> Hmm.  I thought that xmloption = 'content' was supposed to be strictly
>> more permissive than xmloption = 'document'.

> In the spirit of not leaving a good question hanging, this turns out to be
> a difference between the 2003 SQL/XML standard (which PG implements) and
> the later versions, which changed the data model so there really is a
> containment relationship between 'content' and 'document'.
> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION

See also
https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org

It's odd that people are just reporting this now when it's been like that
for quite a few years, but anyway we've got a problem.  Sounds like maybe
adopting the later standards' definitions would fix it?  Although I have
no idea how complicated that'd be.

            regards, tom lane


Re: Question about xmloption and pg_restore

От
Chapman Flack
Дата:
On 10/25/18 05:02, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> a difference between the 2003 SQL/XML standard (which PG implements) and
>> the later versions, which changed the data model so there really is a
>> containment relationship between 'content' and 'document'.
>> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION
> 
> See also
> https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
> 
> It's odd that people are just reporting this now when it's been like that
> for quite a few years, but anyway we've got a problem.  Sounds like maybe
> adopting the later standards' definitions would fix it?  Although I have
> no idea how complicated that'd be.

Supporting the later standards entirely would be a commendable thing,
but honest work:

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Possible_ways_forward

OTOH, making the current XML parsing not fail in this particular case
(which could be viewed as adopting the later standards' relationship
of CONTENT to DOCUMENT) might just be as simple as having the current
parsing code for CONTENT detect whether the string "starts with" a
<!DOCTYPE and fall back to the existing parsing code for DOCUMENT
if it does.

... where "starts with" actually means "possibly following some
whitespace, comments, or PIs, but you can stop looking if you see
a start-element", so essentially a port to C of:

https://github.com/tada/pljava/blob/V1_5_1/pljava/src/main/java/org/postgresql/pljava/jdbc/SQLXMLImpl.java#L409

which decides whether the input should be passed straight to the DOCUMENT-
style parser or somehow treated specially to parse as CONTENT. In Java
the special treatment involves a wrapping element, in xml.c it involves
calling a different libxml2 function, xmlParseBalancedChunkMemory, but
the choice of which method to apply is the same choice.

IIRC, XML comments don't nest, so it may be that "possibly following
some whitespace, comments, or PIs" could be shown to be a regular language,
and checked with a regex. I did it the more explicit way in Java for
clarity, and because the API was there, and so I wouldn't have to think
about it.

-Chap