Обсуждение: XMLDocument (SQL/XML X030)

Поиск

Список

Период

Сортировка

XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

04 декабря 2024 г., 19:18:54

Hi,

I'd like to propose the implementation of XMLDocument (SQL/XML X030).It 
basically returns an XML document from a given XML expression, e.g.

SELECT
   xmldocument(
     xmlelement(NAME foo,
       xmlattributes(42 AS att),
       xmlelement(NAME bar,
         xmlconcat('va', 'lue'))
     )
   );

              xmldocument
--------------------------------------
  <foo att="42"><bar>value</bar></foo>
(1 row)

XMLDocument doesn't do much. In fact, it might be reduced to a simple 
xmlparse() call as XMLOPTION_DOCUMENT...

xmlparse(data, XMLOPTION_DOCUMENT, true)

... to make sure that the given XML expression is a valid document - 
still need some more research there. One could argue that XMLDocument() 
is in most cases unnecessary, but I believe it would facilitate the 
migration of scripts from other database products.

Any thoughts?

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Andrew Dunstan

Дата:

10 декабря 2024 г., 16:59:11

On 2024-12-10 Tu 2:48 AM, Jim Jones wrote:
> On 04.12.24 17:18, Jim Jones wrote:
>> I'd like to propose the implementation of XMLDocument (SQL/XML X030).
>> It basically returns an XML document from a given XML expression, e.g.
>>
>> SELECT
>>    xmldocument(
>>      xmlelement(NAME foo,
>>        xmlattributes(42 AS att),
>>        xmlelement(NAME bar,
>>          xmlconcat('va', 'lue'))
>>      )
>>    );
>>
>>               xmldocument
>> --------------------------------------
>>   <foo att="42"><bar>value</bar></foo>
>> (1 row)
> v1 attached attempts to implement XMLDocument() as described above.
>
> Feedback welcome.
>

LGTM at a first glance.


Please add this to the next CommitFest if you haven't done already.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

10 декабря 2024 г., 17:14:41

Hi Andrew

On 10.12.24 14:59, Andrew Dunstan wrote:
> LGTM at a first glance.
>
>
> Please add this to the next CommitFest if you haven't done already. 

Thanks!

This is the CF entry: https://commitfest.postgresql.org/51/5431/

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

14 января, 11:14:34

út 14. 1. 2025 v 8:11 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

Hi,

I'd like to propose the implementation of XMLDocument (SQL/XML X030).It
basically returns an XML document from a given XML expression, e.g.

SELECT
   xmldocument(
     xmlelement(NAME foo,
       xmlattributes(42 AS att),
       xmlelement(NAME bar,
         xmlconcat('va', 'lue'))
     )
   );

              xmldocument
--------------------------------------
  <foo att="42"><bar>value</bar></foo>
(1 row)

XMLDocument doesn't do much. In fact, it might be reduced to a simple
xmlparse() call as XMLOPTION_DOCUMENT...

xmlparse(data, XMLOPTION_DOCUMENT, true)

... to make sure that the given XML expression is a valid document -
still need some more research there. One could argue that XMLDocument()
is in most cases unnecessary, but I believe it would facilitate the
migration of scripts from other database products.

Any thoughts?

I did some research and the design of this document is different

1. Oracle doesn't support this

2. DB2 has different implementations for z/OS (variadic) and for unix (nonvariadic)

3. looks so db2 allows some concatenation of xml content when xmlexpr is not the document already (not tested)

4. Your implementation just raise an exception

I didn't find a free downloadable SQL/XML standard with description of XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent and it is different from your implementation.

So the argument of better compatibility for this patch doesn't look too strong. But I found that the usage of XMLDOCUMENT is required for storing XML, so it can be a frequently used function. Unfortunately, I do not have any knowledge about db2. It is hard to understand the usage of this function, because the sense is probably different than in DB2, and the documentation doesn't explain well an usage and motivation for this function. If it does a check, then it is not described in doc.

Regards

Pavel

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

16 января, 00:05:41

Hi Pavel

On 14.01.25 09:14, Pavel Stehule wrote:
> I did some research and the design of this document is different
>
> 1. Oracle doesn't support this
> 2. DB2 has different implementations for z/OS (variadic) and for unix
> (nonvariadic)
> 3. looks so db2 allows some concatenation of xml content when xmlexpr
> is not the document already (not tested)
> 4. Your implementation just raise an exception

I'm not entirely sure I follow. XMLDOCUMENT is designed to produce a well-formed XML document, and according to the XML
specification,a well-formed document must have precisely one root element.
 

SELECT
  xmlserialize(DOCUMENT
  xmldocument(
    xmlelement(NAME root,
      xmlattributes(42 AS att),
      xmlcomment('comment'),
      xmlelement(NAME foo,'<foo&bar>'),
      xmlelement(NAME bar, 
        xmlconcat('va', 'lue')),
      xmlpi(name pi),
      xmlelement(NAME txt, xmltext('<"&>'))
    )) AS text INDENT) ;
           xmlserialize           
----------------------------------
 <root att="42">                 +
   <!--comment-->                +
   <foo><foo&bar></foo>+
   <bar>value</bar>              +
   <?pi?>                        +
   <txt><"&></txt>     +
 </root>
(1 row)


Could you provide an example of this feature you're missing?

Malformed CONTENT xml strings will indeed raise an exception.

SELECT xmldocument('foo'::xml);
ERROR:  invalid XML document
DETAIL:  line 1: Start tag expected, '<' not found
foo
^

>
> I didn't find a free downloadable SQL/XML standard with description of
> XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent
> and it is different from your implementation.


The main idea is to ensure that an xml string is a valid document (even
in CONTENT mode)

postgres=# SET xmloption TO DOCUMENT;
SET
postgres=# SELECT 'foo'::xml;
ERROR:  invalid XML document
LINE 1: SELECT 'foo'::xml;
               ^
DETAIL:  line 1: Start tag expected, '<' not found
foo
^
postgres=# SET xmloption TO CONTENT;
SET
postgres=# SELECT 'foo'::xml;
 xml
-----
 foo
(1 row)

postgres=# SELECT xmldocument('foo'::xml);
ERROR:  invalid XML document
DETAIL:  line 1: Start tag expected, '<' not found
foo
^

> So the argument of better compatibility for this patch doesn't look
> too strong. But I found that the usage of XMLDOCUMENT is required for
> storing XML, so it can  be a frequently used function. Unfortunately,
> I do not have any knowledge about db2. It is hard to understand the
> usage of this function, because the sense is probably different than
> in DB2, and the documentation doesn't explain well an usage and
> motivation for this function. If it does a check, then it is not
> described in doc.
>
Perhaps changing the documentation like this would make things clearer?

"The xmldocument function encapsulates the XML expression within a valid XML document structure. The expression passed
asthe argument must be a valid, single-rooted XML fragment. If the XML expression is NULL, the result will also be
NULL."

Many thanks for the review!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

16 января, 09:11:32

st 15. 1. 2025 v 22:05 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

Hi Pavel

On 14.01.25 09:14, Pavel Stehule wrote:
> I did some research and the design of this document is different
>
> 1. Oracle doesn't support this
> 2. DB2 has different implementations for z/OS (variadic) and for unix
> (nonvariadic)
> 3. looks so db2 allows some concatenation of xml content when xmlexpr
> is not the document already (not tested)
> 4. Your implementation just raise an exception

I'm not entirely sure I follow. XMLDOCUMENT is designed to produce a well-formed XML document, and according to the XML specification, a well-formed document must have precisely one root element.

SELECT
xmlserialize(DOCUMENT
xmldocument(
xmlelement(NAME root,
xmlattributes(42 AS att),
xmlcomment('comment'),
xmlelement(NAME foo,'<foo&bar>'),
xmlelement(NAME bar,
xmlconcat('va', 'lue')),
xmlpi(name pi),
xmlelement(NAME txt, xmltext('<"&>'))
)) AS text INDENT) ;
xmlserialize
----------------------------------
<root att="42"> +
 +
<foo><foo&bar></foo>+
<bar>value</bar> +
<?pi?> +
<txt><"&></txt> +
</root>
(1 row)

Could you provide an example of this feature you're missing?

Malformed CONTENT xml strings will indeed raise an exception.

SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^

>
> I didn't find a free downloadable SQL/XML standard with description of
> XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent
> and it is different from your implementation.

The main idea is to ensure that an xml string is a valid document (even
in CONTENT mode)

postgres=# SET xmloption TO DOCUMENT;
SET
postgres=# SELECT 'foo'::xml;
ERROR: invalid XML document
LINE 1: SELECT 'foo'::xml;
^
DETAIL: line 1: Start tag expected, '<' not found
foo
^
postgres=# SET xmloption TO CONTENT;
SET
postgres=# SELECT 'foo'::xml;
xml
-----
foo
(1 row)

postgres=# SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^

> So the argument of better compatibility for this patch doesn't look
> too strong. But I found that the usage of XMLDOCUMENT is required for
> storing XML, so it can be a frequently used function. Unfortunately,
> I do not have any knowledge about db2. It is hard to understand the
> usage of this function, because the sense is probably different than
> in DB2, and the documentation doesn't explain well an usage and
> motivation for this function. If it does a check, then it is not
> described in doc.
>
Perhaps changing the documentation like this would make things clearer?

"The xmldocument function encapsulates the XML expression within a valid XML document structure. The expression passed as the argument must be a valid, single-rooted XML fragment. If the XML expression is NULL, the result will also be NULL."

It is better.

My note was related to a very different description of this functionality in DB2. So if you propose this function for better compatibility (and this function is implemented only by db2), it is surprising to see that this functionality is described (and probably implemented) very differently. Because I do not have db2 and I miss db2 knowledge, I don't know if differences in implementation and description are based on different technology (XML like graph or XML like string) or if it is something that is missing in this patch.

Regards

Pavel

Many thanks for the review!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

20 января, 14:02:36

On 16.01.25 08:21, Jim Jones wrote:
> On 16.01.25 07:11, Pavel Stehule wrote:
>> It is better.
> v2 attached updates the documentation.
>> My note was related to a very different description of this
>> functionality in DB2. So if you propose this function for better
>> compatibility (and this function is implemented only by db2), it is
>> surprising to see that this functionality is described (and probably
>> implemented) very differently. Because I do not have db2 and I miss
>> db2 knowledge, I don't know if differences in implementation and
>> description are based on different technology (XML like graph or XML
>> like string) or if it is something that is missing in this patch.
> I suppose it's mostly because PostgreSQL and DB2 have different
> structures for the XML data type; DB2 stores it in its native
> hierarchical format rather than as text.
>
> Thanks for the review.
>
> Best, Jim

The DB2 "Document node constructors" might provide some insights into
its behavior regarding well-formed XML documents [1]:

"No validation is performed on the constructed document node. The XQuery
document node constructor does not enforce the XML 1.0 rules that govern
the structure of an XML document. For example, a document node is not
required to have exactly one child that is an element node."

This suggests that DB2's design reflects a different approach to
handling XML, focusing less on enforcing XML 1.0 constraints. It appears
to be more of a design philosophy regarding how XML is integrated into
the database system as a whole, rather than just a difference in the
implementation of the XMLDocument function.

PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
functions that require it. Since RETURNING CONTENT implies that the
output is a well-formed XML document (e.g., single-rooted), I would
argue that the behavior of this patch is more intuitive and aligns best
with the expectations of XML document structure.

Any thoughts?

Best, Jim

1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

20 января, 22:09:27

On 01/20/25 06:02, Jim Jones wrote:
> The DB2 "Document node constructors" might provide some insights into
> its behavior regarding well-formed XML documents [1]:
>
> "No validation is performed on the constructed document node. The XQuery
> document node constructor does not enforce the XML 1.0 rules that govern
> the structure of an XML document. For example, a document node is not
> required to have exactly one child that is an element node."
>
> This suggests that DB2's design reflects a different approach to
> handling XML, focusing less on enforcing XML 1.0 constraints. It appears
> to be more of a design philosophy regarding how XML is integrated into
> the database system as a whole, rather than just a difference in the
> implementation of the XMLDocument function.

Indeed. ISO SQL/XML changed significantly between the 2003 edition
(largely followed by PostgreSQL) and the 2006 and all later editions.
There's a rundown of those changes at [3].

> PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
> clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
> functions that require it. Since RETURNING CONTENT implies that the
> output is a well-formed XML document (e.g., single-rooted),

In fact, you can't infer single-root-element-ness from RETURNING CONTENT,
according to the standard. Single-root-element-ness is checked by the
IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify
DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor.

That amounts to a bit of unfortunate punning on the word DOCUMENT,
but so help me that's what's in the standard.

It may help to think in terms of the hierarchy of XML types that the
2006 standard introduced (cribbed here from [3]):

SEQUENCE
|
(?sequence of length 1, a document node)
|
CONTENT(ANY)----------------.----------------(?every element
|                           |                 conforms to a
(?every element has         (?no extraneous   schema)
 xdt:untyped and !nilled,   nodes)            |
 every attribute has        |                 |
 xdt:untypedAtomic)         DOCUMENT(ANY)     CONTENT(XMLSCHEMA)
 |                                            |
 CONTENT(UNTYPED)                             (?whole thing is valid
 |                                            according to schema)
 (?no extraneous nodes)                       |
 |                                            DOCUMENT(XMLSCHEMA)
 DOCUMENT(UNTYPED)

where the condition (?no extraneous nodes) is shorthand for SQL/XML's
more precise "whose `children` property has exactly one XQuery element
node, zero or more XQuery comment nodes, and zero or more XQuery
processing instruction nodes".

So that (?no extraneous nodes) condition is required for any of
the XML(DOCUMENT...) types. When you relax that condition, you have
an XML(CONTENT...) type.

The XMLDOCUMENT constructor is so named because it constructs what
corresponds to an XQuery document node—which actually corresponds to
the XML(CONTENT...) SQL/XML types, and does not enforce having a
single root element:

  "This data model is more permissive: a Document Node may be empty,
  it may have more than one Element Node as a child, and it also
  permits Text Nodes as children."[4]

So in terms of the SQL/XML type hierarchy, what you get back from
XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...)
types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the
implementation).

If you then want to know if it is single-rooted, you can apply the
IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type.

(And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a
value of type XML(SEQUENCE). The sequence has length 1, a document
node, making it safely castable to XML(CONTENT(ANY)), but whether
you can cast it to an XML(DOCUMENT...) type will depend on what
children that document node has.)

Long story short, an XMLDOCUMENT constructor that enforced having
a single root element would be nonconformant.

Regards,
-Chap

> 1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
> 2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html
3 -
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006
4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

20 января, 22:56:28

Hi Chap,

Thanks for the thorough explanation! 

On 20.01.25 20:09, Chapman Flack wrote:
>> PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
>> clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
>> functions that require it. Since RETURNING CONTENT implies that the
>> output is a well-formed XML document (e.g., single-rooted),
> In fact, you can't infer single-root-element-ness from RETURNING CONTENT,
> according to the standard. Single-root-element-ness is checked by the
> IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify
> DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor.
>
> That amounts to a bit of unfortunate punning on the word DOCUMENT,
> but so help me that's what's in the standard.


Yeah, the term DOCUMENT seems a bit misleading in this context.


>
> It may help to think in terms of the hierarchy of XML types that the
> 2006 standard introduced (cribbed here from [3]):
>
> SEQUENCE
> |
> (?sequence of length 1, a document node)
> |
> CONTENT(ANY)----------------.----------------(?every element
> |                           |                 conforms to a
> (?every element has         (?no extraneous   schema)
>  xdt:untyped and !nilled,   nodes)            |
>  every attribute has        |                 |
>  xdt:untypedAtomic)         DOCUMENT(ANY)     CONTENT(XMLSCHEMA)
>  |                                            |
>  CONTENT(UNTYPED)                             (?whole thing is valid
>  |                                            according to schema)
>  (?no extraneous nodes)                       |
>  |                                            DOCUMENT(XMLSCHEMA)
>  DOCUMENT(UNTYPED)
>
> where the condition (?no extraneous nodes) is shorthand for SQL/XML's
> more precise "whose `children` property has exactly one XQuery element
> node, zero or more XQuery comment nodes, and zero or more XQuery
> processing instruction nodes".
>
> So that (?no extraneous nodes) condition is required for any of
> the XML(DOCUMENT...) types. When you relax that condition, you have
> an XML(CONTENT...) type.
>
> The XMLDOCUMENT constructor is so named because it constructs what
> corresponds to an XQuery document node—which actually corresponds to
> the XML(CONTENT...) SQL/XML types, and does not enforce having a
> single root element:
>
>   "This data model is more permissive: a Document Node may be empty,
>   it may have more than one Element Node as a child, and it also
>   permits Text Nodes as children."[4]


Thanks a lot for pointing that out! I guess it's clear now.


>
> So in terms of the SQL/XML type hierarchy, what you get back from
> XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...)
> types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the
> implementation).
>
> If you then want to know if it is single-rooted, you can apply the
> IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type.
>
> (And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a
> value of type XML(SEQUENCE). The sequence has length 1, a document
> node, making it safely castable to XML(CONTENT(ANY)), but whether
> you can cast it to an XML(DOCUMENT...) type will depend on what
> children that document node has.)
>
> Long story short, an XMLDOCUMENT constructor that enforced having
> a single root element would be nonconformant.
>

If I understand correctly, the compliant approach would be to always
treat the input expression as CONTENT:

|PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));|

Is that right?"


>
>> 1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
>> 2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html
> 3 -
> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006
> 4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode
>

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Andrew Dunstan

Дата:

21 января, 00:33:40

On 2025-01-20 Mo 2:09 PM, Chapman Flack wrote:
>
> Long story short, an XMLDOCUMENT constructor that enforced having
> a single root element would be nonconformant.
>

What an (expletive deleted) mess. I'm glad someone is on top of it.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

21 января, 01:21:21

On 01/20/25 14:56, Jim Jones wrote:
> If I understand correctly, the compliant approach would be to always
> treat the input expression as CONTENT:
> 
> |PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));|
> 
> Is that right?"

Did you mean XMLOPTION_CONTENT?

I think that would be right ... which, to be honest, leaves me
in doubt that there is any substantive work at all for an
XMLDOCUMENT constructor in PostgreSQL to do. Possibly that's why
it wasn't provided originally.

We don't have the XML(SEQUENCE) type, and our single XML type
essentially corresponds to XML(CONTENT(ANY)). Any value that has
already been accepted by the input function should already have
the same serialized form you'd expect from XMLDOCUMENT.

We don't have BY REF and passing XML node trees around, so there's
really no way to notice that you've applied XMLDOCUMENT to wrap
stuff in a 'document node' that isn't visible in the serialized
form that is all we store.

Likewise, I'm not thinking of any way to combine our other XML
constructors / aggregates to produce a result that isn't already
XML(CONTENT(ANY)). (If I could think of a way, I think it would mean
those weren't well-specified.)

Therefore I'm thinking that, given the specifics of our XML support,
a fully conformant and efficient XMLDOCUMENT could be implemented
just by returning its XML argument.

That opens a question of whether it's worth the effort to supply
it at all. Maybe it could reduce the surprise for people coming from
another DBMS and finding it missing, and/or be a placeholder in case
we ever implement enough more of the newer SQL/XML standard for it
to have a real effect.

(In a DBMS implementing the whole XML type hierarchy, you could
have an XML(SEQUENCE) containing a single element node, which
you might think you could easily cast to XML(CONTENT(ANY)) or even
XML(DOCUMENT(ANY)), but the cast would raise 2200U or 2200L, respectively:
the sequence would first need to be transformed into a sequence of
a single document node wrapping the original sequence, which is exactly
what you'd use XMLDOCUMENT to do. And of course you could put RETURNING
CONTENT on that XMLDOCUMENT to accomplish the cast in the same step.)

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Robert Treat

Дата:

22 января, 01:45:02

On Tue, Jan 21, 2025 at 5:58 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
> On 20.01.25 23:21, Chapman Flack wrote:
> > Therefore I'm thinking that, given the specifics of our XML support,
> > a fully conformant and efficient XMLDOCUMENT could be implemented
> > just by returning its XML argument.
>
>
> After your explanation, I tend to agree.
>
> v3, attached, incorporates these changes and updates the regression
> tests accordingly.
>
>
> >
> > That opens a question of whether it's worth the effort to supply
> > it at all. Maybe it could reduce the surprise for people coming from
> > another DBMS and finding it missing, and/or be a placeholder in case
> > we ever implement enough more of the newer SQL/XML standard for it
> > to have a real effect.
>
> Although quite trivial, I believe this function could still be valuable
> in facilitating the migration of scripts from other database systems --
> improving SQL/XML conformance also isn't a bad thing :).
>

Is there some concrete use case you have seen that this would help
with? Not objecting to adding it, but you've mentioned this migration
idea twice but it seems to me this doesn't conform with existing
implementations, and I don't see much benefit in migration use cases
specifically, so I'm just curious if I am overlooking something?


Robert Treat
https://xzilla.net

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

22 января, 02:35:56

On 21.01.25 23:45, Robert Treat wrote:
> Is there some concrete use case you have seen that this would help
> with? Not objecting to adding it, but you've mentioned this migration
> idea twice but it seems to me this doesn't conform with existing
> implementations, and I don't see much benefit in migration use cases
> specifically, so I'm just curious if I am overlooking something?

I wouldn’t frame it as a "migration idea". My point is that this would
be one less function to modify when migrating a script from another
database system to PostgreSQL.

For example, in DB2:

SELECT xmldocument(xmlforest(10 as x, 20 as y)) FROM SYSIBM.SYSDUMMY1;

----------------------------------------------------------
<X>10</X><Y>20</Y>

  1 record(s) selected.

 ... This same query wouldn’t work in PostgreSQL, but the function makes
the query more compatible.

SELECT xmldocument(xmlforest(10 as x, 20 as y));

    xmldocument     
--------------------
 <x>10</x><y>20</y>
(1 row)

Which compatibility issues with existing implementations are you
referring to?

Thanks!

Jim

Re: XMLDocument (SQL/XML X030)

От

Robert Treat

Дата:

22 января, 21:41:10

On Tue, Jan 21, 2025 at 6:36 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
> On 21.01.25 23:45, Robert Treat wrote:
> > Is there some concrete use case you have seen that this would help
> > with? Not objecting to adding it, but you've mentioned this migration
> > idea twice but it seems to me this doesn't conform with existing
> > implementations, and I don't see much benefit in migration use cases
> > specifically, so I'm just curious if I am overlooking something?
>
> I wouldn’t frame it as a "migration idea". My point is that this would
> be one less function to modify when migrating a script from another
> database system to PostgreSQL.
>
<snip>
>
> Which compatibility issues with existing implementations are you
> referring to?
>

I'm mostly referring back to Pavel's statements:
| 1. Oracle doesn't support this
| 2. DB2 has different implementations for z/OS (variadic) and for
unix (nonvariadic)

With an added #3 which is AFAIK sql server doesn't implement this
either (I believe they implement something similar with
sp_xml_preparedocument)

So even if we are following the spec (which I think technically we may
not be), if no other database implements it the way we are, then it
really isn't one less function people will have to modify. Which
again, I'm not trying to argue against having it, I'm just trying to
figure out if there is some specific group that would benefit from it.

Robert Treat
https://xzilla.net

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

23 января, 00:35:35

On 01/22/25 13:41, Robert Treat wrote:
> So even if we are following the spec (which I think technically we may
> not be),

There are definite ways in which we're not following the SQL/XML spec,
which we document in an appendix[1]. The one that matters here is that
we just have a single XML type instead of the hierarchy of them in the
spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).

With that divergence from the spec understood, I don't see any new
divergence in providing an XMLDOCUMENT function that just returns
its argument. That's the correct result to return for anything that's
a valid value of our XML type to begin with.

> if no other database implements it the way we are

There may be other systems that don't implement it at all, for which
case I don't see any compatibility issue created because we have it
and they do not.

There may be systems that implement the SQL/XML type hierarchy more
completely than we do, so that it would be possible for their
XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
RETURNING SEQUENCE, both of which are things that can't happen
in PostgreSQL. I don't see a problem in that either, as long as
theirs produces results matching ours in the RETURNING CONTENT,
passed an XML(CONTENT...) argument case.

If another system produces results that differ, in that restricted
domain corresponding to ours, I'd say something's nonconformant in
that implementation. In my opinion, that would only be a problem
for us if the system in question is an 800 lb gorilla and has many
users relying on the differing behavior.

Regarding the patch itself: I wonder if there is already, somewhere
in the code base, a generic fmgr identity function that the pg_proc
entry could point to. Or is there a strong project tradition in favor
of writing a dedicated new one to use here? I'm not sure it's critical
to have a version that tests USE_LIBXML or reports it's unsupported,
because without support I doubt there's any way to pass it a non-null
XML argument, and if declared STRICT it won't be called for a null one
anyway.

Regards,
-Chap

[1] https://www.postgresql.org/docs/17/xml-limits-conformance.html

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

23 января, 02:55:39

Hi Chapman & Robert

Many thanks for the input

On 22.01.25 22:35, Chapman Flack wrote:
> On 01/22/25 13:41, Robert Treat wrote:
>> So even if we are following the spec (which I think technically we may
>> not be),
> There are definite ways in which we're not following the SQL/XML spec,
> which we document in an appendix[1]. The one that matters here is that
> we just have a single XML type instead of the hierarchy of them in the
> spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).
>
> With that divergence from the spec understood, I don't see any new
> divergence in providing an XMLDOCUMENT function that just returns
> its argument. That's the correct result to return for anything that's
> a valid value of our XML type to begin with.
>
>> if no other database implements it the way we are
> There may be other systems that don't implement it at all, for which
> case I don't see any compatibility issue created because we have it
> and they do not.


Yeah, as long as we stick to the specification, I don’t see any issue
with including it.


>
> There may be systems that implement the SQL/XML type hierarchy more
> completely than we do, so that it would be possible for their
> XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
> RETURNING SEQUENCE, both of which are things that can't happen
> in PostgreSQL. I don't see a problem in that either, as long as
> theirs produces results matching ours in the RETURNING CONTENT,
> passed an XML(CONTENT...) argument case.
>
> If another system produces results that differ, in that restricted
> domain corresponding to ours, I'd say something's nonconformant in
> that implementation. In my opinion, that would only be a problem
> for us if the system in question is an 800 lb gorilla and has many
> users relying on the differing behavior.
>
> Regarding the patch itself: I wonder if there is already, somewhere
> in the code base, a generic fmgr identity function that the pg_proc
> entry could point to. Or is there a strong project tradition in favor
> of writing a dedicated new one to use here? I'm not sure it's critical
> to have a version that tests USE_LIBXML or reports it's unsupported,
> because without support I doubt there's any way to pass it a non-null
> XML argument, and if declared STRICT it won't be called for a null one
> anyway.
>
> Regards,
> -Chap
>
> [1] https://www.postgresql.org/docs/17/xml-limits-conformance.html


Regarding compatibility, here is an example of how both implementations
handle single-rooted XML, non-single-rooted XML, and NULL values.


== DB2 [1] ==

WITH t(x) AS (
  VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmlforest(42 AS foo, 73 AS bar)),
    (NULL)
)
SELECT xmldocument(x) FROM t;

----------------------------
<root><foo>bar</foo></root>   
<FOO>42</FOO><BAR>73</BAR>

-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
 
 3 record(s) selected.
 
 
 
== PostgreSQL ==
 
WITH t(x) AS (
  VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmlforest(42 AS foo, 73 AS bar)),
    (NULL)
)
SELECT xmldocument(x) FROM t;

         xmldocument         
-----------------------------
 <root><foo>bar</foo></root>
 <foo>42</foo><bar>73</bar>
 
(3 rows)


To make it clear: ensuring this function is compatible with other
database products is IMHO beneficial (and is my primary motivation), but
it shouldn't come at the expense of violating the SQL/XML specification.
I mean, if in some edge case, another database system implemented
XMLDocument in a way that deviates from the standard, I'd argue it’s not
worth prioritizing compatibility -- assuming, of course, that we are
fully following the standard.


Best regards, Jim


1 - https://dbfiddle.uk/G9VoHKp7

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

23 января, 09:50:23

čt 23. 1. 2025 v 0:55 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

Hi Chapman & Robert

Many thanks for the input

On 22.01.25 22:35, Chapman Flack wrote:
> On 01/22/25 13:41, Robert Treat wrote:
>> So even if we are following the spec (which I think technically we may
>> not be),
> There are definite ways in which we're not following the SQL/XML spec,
> which we document in an appendix[1]. The one that matters here is that
> we just have a single XML type instead of the hierarchy of them in the
> spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).
>
> With that divergence from the spec understood, I don't see any new
> divergence in providing an XMLDOCUMENT function that just returns
> its argument. That's the correct result to return for anything that's
> a valid value of our XML type to begin with.
>
>> if no other database implements it the way we are
> There may be other systems that don't implement it at all, for which
> case I don't see any compatibility issue created because we have it
> and they do not.

Yeah, as long as we stick to the specification, I don’t see any issue
with including it.

>
> There may be systems that implement the SQL/XML type hierarchy more
> completely than we do, so that it would be possible for their
> XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
> RETURNING SEQUENCE, both of which are things that can't happen
> in PostgreSQL. I don't see a problem in that either, as long as
> theirs produces results matching ours in the RETURNING CONTENT,
> passed an XML(CONTENT...) argument case.
>
> If another system produces results that differ, in that restricted
> domain corresponding to ours, I'd say something's nonconformant in
> that implementation. In my opinion, that would only be a problem
> for us if the system in question is an 800 lb gorilla and has many
> users relying on the differing behavior.
>
> Regarding the patch itself: I wonder if there is already, somewhere
> in the code base, a generic fmgr identity function that the pg_proc
> entry could point to. Or is there a strong project tradition in favor
> of writing a dedicated new one to use here? I'm not sure it's critical
> to have a version that tests USE_LIBXML or reports it's unsupported,
> because without support I doubt there's any way to pass it a non-null
> XML argument, and if declared STRICT it won't be called for a null one
> anyway.
>
> Regards,
> -Chap
>
> [1] https://www.postgresql.org/docs/17/xml-limits-conformance.html

Regarding compatibility, here is an example of how both implementations
handle single-rooted XML, non-single-rooted XML, and NULL values.

== DB2 [1] ==

WITH t(x) AS (
VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmlforest(42 AS foo, 73 AS bar)),
    (NULL)
)
SELECT xmldocument(x) FROM t;

----------------------------
<root><foo>bar</foo></root>
<FOO>42</FOO><BAR>73</BAR>

I think so in this form (just forward input to output) I have no objection.

There is a benefit with a) possible zero work with migration from db2, b) nobody needs to repeat a work which is a correct implementation of XMLDOCUMENT function.

Maybe opened question can be implementation like classic scalar function or via XmlExpr

In this moment I prefer to use XmlExpr from consistency reasons

Regards

Pavel

-

3 record(s) selected.

== PostgreSQL ==

WITH t(x) AS (
VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmlforest(42 AS foo, 73 AS bar)),
    (NULL)
)
SELECT xmldocument(x) FROM t;

         xmldocument
-----------------------------
<root><foo>bar</foo></root>
<foo>42</foo><bar>73</bar>

(3 rows)

To make it clear: ensuring this function is compatible with other
database products is IMHO beneficial (and is my primary motivation), but
it shouldn't come at the expense of violating the SQL/XML specification.
I mean, if in some edge case, another database system implemented
XMLDocument in a way that deviates from the standard, I'd argue it’s not
worth prioritizing compatibility -- assuming, of course, that we are
fully following the standard.

Best regards, Jim

1 - https://dbfiddle.uk/G9VoHKp7

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

23 января, 18:14:35

čt 23. 1. 2025 v 16:06 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

Hi

On 23.01.25 07:50, Pavel Stehule wrote:
> I think so in this form (just forward input to output) I have no
> objection.
>
> There is a benefit with a) possible zero work with migration from db2,
> b) nobody needs to repeat a work which is a correct implementation of
> XMLDOCUMENT function.
>
> Maybe opened question can be implementation like classic scalar
> function or via XmlExpr
>
> In this moment I prefer to use XmlExpr from consistency reasons

To keep it consistent with the existing code, I think this function is
in the right place. There are similar functions in xml.c, e.g.
xmltotext, texttoxml.

These functions are cast functions - they should be V1 functions only - casting cannot work with pseudo functions.

But it is true, so xmlcomment is not a pseudo function either. So minimally this function is precedent, so there is not strong dependency on XmlExp, which I expected.

I updated the function comment and commit message (v4 attached) to make
things clearer.

Since the status of this patch is waiting on author, is there anything
else I should take a look / improve?

Thanks!

Best regards,Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

24 января, 09:28:30

čt 23. 1. 2025 v 16:06 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

Hi

On 23.01.25 07:50, Pavel Stehule wrote:
> I think so in this form (just forward input to output) I have no
> objection.
>
> There is a benefit with a) possible zero work with migration from db2,
> b) nobody needs to repeat a work which is a correct implementation of
> XMLDOCUMENT function.
>
> Maybe opened question can be implementation like classic scalar
> function or via XmlExpr
>
> In this moment I prefer to use XmlExpr from consistency reasons

To keep it consistent with the existing code, I think this function is
in the right place. There are similar functions in xml.c, e.g.
xmltotext, texttoxml.

I updated the function comment and commit message (v4 attached) to make
things clearer.

Since the status of this patch is waiting on author, is there anything
else I should take a look / improve?

I think documentation should be strongly enhanced. This is probably the hardest part of this patch - explain well what this function does and what it doesn't.

Regards

Pavel

Thanks!

Best regards,Jim

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

24 января, 11:11:59

Hi

On 24.01.25 07:28, Pavel Stehule wrote:
> I think documentation should be strongly enhanced. This is probably
> the hardest part of this patch - explain well what this function does
> and what it doesn't.

You mean something like this? Or perhaps something more technical?

The <function>xmldocument</function> function returns a document node,
representing an XML document, from the provided <type>xml</type>
expression. The input expression can represent any valid XML content,
including elements, text, or a sequence of nodes. If the
<type>xml</type> expression is NULL, the function returns NULL. This
function does not require the input to have a single root node or
encapsulate the result in a root element. The validation of the
<type>xml</type> expression depends on the current <xref
linkend="guc-xmloption"/> setting.

Example:

WITH xmldata (val) AS (
  VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmltext('foo&bar')),
    (xmlelement(NAME el)),
    (xmlforest(42 AS foo, 73 AS bar))
)
SELECT xmldocument(val) FROM xmldata;

         xmldocument
-----------------------------
 <root><foo>bar</foo></root>
 foo&bar
 <el/>
 <foo>42</foo><bar>73</bar>
(4 rows)

Thanks!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

24 января, 14:31:58

pá 24. 1. 2025 v 9:12 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

Hi

On 24.01.25 07:28, Pavel Stehule wrote:
> I think documentation should be strongly enhanced. This is probably
> the hardest part of this patch - explain well what this function does
> and what it doesn't.

You mean something like this? Or perhaps something more technical?

I don't know what it means - `function returns a document node` in the context of Postgres implementation of XML.

I miss the information so it returns an input argument without changing anything, because in a system, where XML expression

holds a complete string (not graph like some other systems), it just does nothing.

If somebody looks the source code, then he should be in panic mode, because the doc talks about returning "document node",

and it is just an argument. So it should be explained more in PostgreSQL XML design, and less in SQL/XML description. Because

In this case, SQL/XML description (in postgresql context) is absolutely confusing.

Regards

Pavel

The <function>xmldocument</function> function returns a document node,
representing an XML document, from the provided <type>xml</type>
expression. The input expression can represent any valid XML content,
including elements, text, or a sequence of nodes. If the
<type>xml</type> expression is NULL, the function returns NULL. This
function does not require the input to have a single root node or
encapsulate the result in a root element. The validation of the
<type>xml</type> expression depends on the current <xref
linkend="guc-xmloption"/> setting.

Example:

WITH xmldata (val) AS (
VALUES
    (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
    (xmltext('foo&bar')),
    (xmlelement(NAME el)),
    (xmlforest(42 AS foo, 73 AS bar))
)
SELECT xmldocument(val) FROM xmldata;

         xmldocument
-----------------------------
<root><foo>bar</foo></root>
foo&bar
<el/>
<foo>42</foo><bar>73</bar>
(4 rows)

Thanks!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

24 января, 15:21:53

On 24.01.25 12:31, Pavel Stehule wrote:
> I don't know what it means - `function returns a document node` in the
> context of Postgres implementation of XML.
> I miss the information so it returns an input argument without
> changing anything, because in a system, where XML expression
> holds a complete string (not graph like some other systems), it just
> does nothing.
>
> If somebody looks the source code, then he should be in panic mode,
> because the doc talks about returning "document node",
> and it is just an argument. So it should be explained more in
> PostgreSQL XML design, and less in SQL/XML description. Because
> In this case, SQL/XML description (in postgresql context) is
> absolutely confusing.

I see. I borrowed document node from the XML standard.

Perhaps it is more palatable like this?

The <function>xmldocument</function> function creates an XML value from
the given <type>xml</type> expression. The input can include any valid
XML content, such as elements, text, or a mixture of elements and text.
If the <type>xml</type> expression is NULL, the function returns NULL.
This function does not require the input to have a single root element
and does not wrap the result in one - instead, it returns an exact XML
representation of the input <type>xml</type> expression. The validation
of the <type>xml</type> expression depends on the current <xref
linkend="guc-xmloption"/> setting.

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

24 января, 15:48:34

pá 24. 1. 2025 v 13:21 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 24.01.25 12:31, Pavel Stehule wrote:
> I don't know what it means - `function returns a document node` in the
> context of Postgres implementation of XML.
> I miss the information so it returns an input argument without
> changing anything, because in a system, where XML expression
> holds a complete string (not graph like some other systems), it just
> does nothing.
>
> If somebody looks the source code, then he should be in panic mode,
> because the doc talks about returning "document node",
> and it is just an argument. So it should be explained more in
> PostgreSQL XML design, and less in SQL/XML description. Because
> In this case, SQL/XML description (in postgresql context) is
> absolutely confusing.

I see. I borrowed document node from the XML standard.

Perhaps it is more palatable like this?

The <function>xmldocument</function> function creates an XML value from
the given <type>xml</type> expression. The input can include any valid
XML content, such as elements, text, or a mixture of elements and text.
If the <type>xml</type> expression is NULL, the function returns NULL.
This function does not require the input to have a single root element
and does not wrap the result in one - instead, it returns an exact XML
representation of the input <type>xml</type> expression. The validation
of the <type>xml</type> expression depends on the current <xref
linkend="guc-xmloption"/> setting.

It is still confusing (and I understand so it can be hard to describe a function that does nothing).

As a PostgreSQL user, and reader documentation, I have question

what is difference between XMLDOCUMENT( some xml expr) and just xml expr

Is there different result SELECT XMLDOCUMENT(XMLELEMENT(name a, 10)) and

SELECT XMLELEMENT(name a, 10) ?

Lot of users use XML functions without XMLDOCUMENT now. The doc should to

help with a reply to question where and when I need (I can) use this function. This should

be in context of Postgres where every XML expr returns XML value, so sentence

creates an XML value from XML expression is still very confusing

What some like

"SQL/XML introduces XMLDOCUMENT function to returns a document node from XML tree of some XML value.

This has not any sense (because XML in Postgres is always materialized to string) in PostgreSQL implementation of XML, and the function just returns

input value". Or some like that.

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

24 января, 16:48:49

On 24.01.25 13:48, Pavel Stehule wrote:
> Lot of users use XML functions without XMLDOCUMENT now. The doc should to
> help with a reply to question where and when I need (I can) use this
> function. This should
> be in context of Postgres where every XML expr returns XML value, so
> sentence
> creates an XML value from XML expression is still very confusing
>
> What some like
>
> "SQL/XML introduces XMLDOCUMENT function to  returns a document node
> from XML tree of some XML value.
> This has not any sense (because XML in Postgres is always materialized
> to string) in PostgreSQL implementation of XML, and the function just
> returns
> input value". Or some like that.

I see. I tried to incorporate your suggestion in the text:

In the SQL/XML specification, the XMLDocument (X030) function is
designed to return a document node from a given XML value expression.
Since PostgreSQL handles XML values as materialized strings rather than
tree structures, this function does not produce a document node
structure but instead returns the validated input <type>xml</type> value
as-is. The input <type>xml</type> value can include any valid XML
content, such as elements, text, or a mixture of both. This function
does not require the input to have a single root element and does not
wrap the result in one. Validation of the <type>xml</type> input value
depends on the current <xref linkend="guc-xmloption"/> setting. If the
input <type>xml</type> value is NULL, the function returns NULL. This
function ensures compatibility with the SQL/XML specification, making it
particularly useful when migrating SQL queries from other database
systems that also implement this standard.

Let me know if it is still confusing ...

Thanks!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

24 января, 18:27:38

pá 24. 1. 2025 v 14:48 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 24.01.25 13:48, Pavel Stehule wrote:
> Lot of users use XML functions without XMLDOCUMENT now. The doc should to
> help with a reply to question where and when I need (I can) use this
> function. This should
> be in context of Postgres where every XML expr returns XML value, so
> sentence
> creates an XML value from XML expression is still very confusing
>
> What some like
>
> "SQL/XML introduces XMLDOCUMENT function to returns a document node
> from XML tree of some XML value.
> This has not any sense (because XML in Postgres is always materialized
> to string) in PostgreSQL implementation of XML, and the function just
> returns
> input value". Or some like that.

I see. I tried to incorporate your suggestion in the text:

In the SQL/XML specification, the XMLDocument (X030) function is
designed to return a document node from a given XML value expression.
Since PostgreSQL handles XML values as materialized strings rather than
tree structures, this function does not produce a document node
structure but instead returns the validated input <type>xml</type> value
as-is. The input <type>xml</type> value can include any valid XML
content, such as elements, text, or a mixture of both. This function
does not require the input to have a single root element and does not
wrap the result in one. Validation of the <type>xml</type> input value
depends on the current <xref linkend="guc-xmloption"/> setting. If the
input <type>xml</type> value is NULL, the function returns NULL. This
function ensures compatibility with the SQL/XML specification, making it
particularly useful when migrating SQL queries from other database
systems that also implement this standard.

Let me know if it is still confusing ...

it is ok for me now. Thank you

Pavel

Thanks!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

24 января, 18:49:34

On 01/24/25 08:48, Jim Jones wrote:
> In the SQL/XML specification, the XMLDocument (X030) function is
> designed to return a document node from a given XML value expression.

Maybe we can take advantage of the way that specifications usually
don't mandate an implementation, but only results equivalent to
an implementation. So we could say something like:

The function `xmldocument` returns the input argument unchanged,
null if the argument is null, and is supplied for compatibility.

The SQL-standard `XMLDOCUMENT` function applied to an XML value
/expr/ has effects equivalent to the XML Query expression
`document { /expr/ }`, specified to replace any document nodes
in the input with their children and wrap the whole result in one
document node. An XML Query "document node" is a relaxed version
of XML document structure, which need not have exactly one child
element node, and also allows text nodes as children.

Some systems support a family of XML data types including
`XML(SEQUENCE)`, which can hold values that do not have such a
structure. In PostgreSQL, every valid non-null value of the single
`XML` type already has that structure, and so is returned by this
function unchanged.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

24 января, 19:18:51

On 01/24/25 10:49, Chapman Flack wrote:
> The SQL-standard `XMLDOCUMENT` function applied to an XML value
> /expr/ has effects equivalent to the XML Query expression
> `document { /expr/ }`, specified to replace any document nodes
> in the input with their children and wrap the whole result in one
> document node. An XML Query "document node" is a relaxed version
> of XML document structure, which need not have exactly one child
> element node, and also allows text nodes as children.

Or even: ... An XML Query "document node" is a relaxed version
of XML document structure that corresponds exactly to what
PostgreSQL's one XML type is already allowed to contain, so
any non-null PostgreSQL XML value can be returned unchanged.
More-permissive XML types some systems offer may hold values
that are not so structured.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

25 января, 00:01:09

On 01/24/25 14:59, Jim Jones wrote:
> In the XML Query standard, a "document node" represents a relaxed
> version of an XML document structure. This corresponds to what
> PostgreSQL's single XML type allows, meaning that any valid non-null
> PostgreSQL XML value can be returned unchanged. Other systems may
> support more permissive XML data types, such as
> <literal>XML(SEQUENCE)</literal>, which allow values that do not conform
> to this structure. In PostgreSQL, every valid non-null value of the XML
> type already has that structure, making any additional processing by
> this function unnecessary.

My one question here would be, does the reader of that paragraph
sit back and wonder "why have we suddenly begun talking about
the XML Query standard?".

It seems to me the key connection there is that the ISO SQL standard
defines XMLDOCUMENT by equivalence to what `document { $expr }` means
in the W3 XML Query standard.

Once that connection is made, it is all smooth sailing.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

25 января, 10:16:34

pá 24. 1. 2025 v 23:11 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 24.01.25 22:01, Chapman Flack wrote:
> It seems to me the key connection there is that the ISO SQL standard
> defines XMLDOCUMENT by equivalence to what `document { $expr }` means
> in the W3 XML Query standard.

It seems I missed one sentence. My bad.

Next try... :)

The <function>xmldocument</function> function returns the input argument
unchanged, or <literal>NULL</literal> if the argument is
<literal>NULL</literal>, and is provided for compatibility.

The SQL-standard <replaceable>XMLDocument</replaceable> function applied
to an XML value <literal>$EXPR</literal>, has effects equivalent to the
XML Query expression <replaceable>document { $EXPR }</replaceable>. It
replaces any document nodes in the input with their children and wraps
the whole result in a single <replaceable>document node</replaceable>.

In the XML Query standard, a <replaceable>document node</replaceable>
represents a relaxed version of an XML document structure. This
corresponds to what PostgreSQL's single XML type allows, meaning that
any valid non-null PostgreSQL XML value can be returned unchanged. Other
systems may support more permissive XML data types, such as
<literal>XML(SEQUENCE)</literal>, which allow values that do not conform
to this structure. In PostgreSQL, every valid non-null value of the XML
type already has that structure, making additional processing by this
function unnecessary.

v6 attached.

I think so doc is ok now

because the function does nothing, then it is useless to convert input to XML and force detosting

Maybe the body of the function should be just

{

#ifdef USE_LIBXML

PG_RETURN_DATUM(PG_GETARG_DATUM(0));

#else

NO_XML_SUPPORT();

return 0;

#endif

}

Regards

Pavel

Thanks.

Best regards, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

25 января, 13:50:52

so 25. 1. 2025 v 9:10 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 25.01.25 08:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input
> to XML and force detosting

Right. Fixed in v7 attached.

I don't see

+Datum
+xmldocument(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+ PG_RETURN_XML_P(PG_GETARG_XML_P(0));
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif /* not USE_LIBXML */
+}
+

you still forces detoasting (inside PG_GETARG_XML_P)

Thanks

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

25 января, 16:23:00

so 25. 1. 2025 v 12:45 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 25.01.25 11:50, Pavel Stehule wrote:
> you still forces detoasting (inside PG_GETARG_XML_P)

Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P

Fixed in v8.

now it is ok.

I'll mark this patch as ready for committer

Regards

Pavel

Thanks!

Best, Jim

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

25 января, 17:10:06

On 01/25/25 02:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input to
> XML and force detosting
> 
> Maybe the body of the function should be just
> ...
>   PG_RETURN_DATUM(PG_GETARG_DATUM(0));

That sort of motivated my question upthread about whether there is
already a function somewhere in the codebase that does exactly that
and could be named in the pg_proc entry for xmldocument, as an
alternative to supplying a brand-new one.

Maybe this is the only instance where it turns out that 'identity'
is the right behavior for a function. But if it could conceivably
happen again, a single C function (maybe even named identity) could
reduce code duplication and make quite clear what the behavior is
with a \sf.

A generic 'identity' function would be lacking the #ifdef USE_LIBXML
and the error message, but I'm not convinced those matter here
anyway. Without XML support, you'll already have raised that error
in any attempt to construct a non-null XML argument to pass, and if
you're passing NULL and the function is strict, you'll never see
the error message from here anyway.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Pavel Stehule

Дата:

25 января, 18:05:10

so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal:

On 01/25/25 02:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input to
> XML and force detosting
>
> Maybe the body of the function should be just
> ...
> PG_RETURN_DATUM(PG_GETARG_DATUM(0));

That sort of motivated my question upthread about whether there is
already a function somewhere in the codebase that does exactly that
and could be named in the pg_proc entry for xmldocument, as an
alternative to supplying a brand-new one.

this should not be a problem, because the already created function XMLDOCUMENT surely will not be in the pg_catalog schema.

Maybe this is the only instance where it turns out that 'identity'
is the right behavior for a function. But if it could conceivably
happen again, a single C function (maybe even named identity) could
reduce code duplication and make quite clear what the behavior is
with a \sf.

I didn't find any function like this.

Regards

Pavel

A generic 'identity' function would be lacking the #ifdef USE_LIBXML
and the error message, but I'm not convinced those matter here
anyway. Without XML support, you'll already have raised that error
in any attempt to construct a non-null XML argument to pass, and if
you're passing NULL and the function is strict, you'll never see
the error message from here anyway.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

26 января, 16:48:03

Hi Chapman & Pavel

On 25.01.25 16:05, Pavel Stehule wrote:
>
>
> so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal:
>
>     On 01/25/25 02:16, Pavel Stehule wrote:
>     > because the function does nothing, then it is useless to convert
>     input to
>     > XML and force detosting
>     >
>     > Maybe the body of the function should be just
>     > ...
>     >   PG_RETURN_DATUM(PG_GETARG_DATUM(0));
>
>     That sort of motivated my question upthread about whether there is
>     already a function somewhere in the codebase that does exactly that
>     and could be named in the pg_proc entry for xmldocument, as an
>     alternative to supplying a brand-new one.
>
>
> this should not be a problem, because the already created function
> XMLDOCUMENT surely will not be in the pg_catalog schema.
>  
>
>
>     Maybe this is the only instance where it turns out that 'identity'
>     is the right behavior for a function. But if it could conceivably
>     happen again, a single C function (maybe even named identity) could
>     reduce code duplication and make quite clear what the behavior is
>     with a \sf.
>
>
> I didn't find any function like this.


I also couldn't find any similar function.


>
> Regards
>
> Pavel
>
>
>     A generic 'identity' function would be lacking the #ifdef USE_LIBXML
>     and the error message, but I'm not convinced those matter here
>     anyway. Without XML support, you'll already have raised that error
>     in any attempt to construct a non-null XML argument to pass, and if
>     you're passing NULL and the function is strict, you'll never see
>     the error message from here anyway.
>
>

Do you envision something like this?

Datum
identity(PG_FUNCTION_ARGS)
{
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
}

If so, where would be the right place to put it? Certainly not in xml.c

Generally speaking, reducing redundancy by reusing existing functions is
always a good thing. However, if we decide to create a generic function
for this purpose, it's important to ensure its existence is clearly
communicated to prevent others from writing their own -- something that,
given the simplicity of this function, seems like a likely scenario. :)
My point is: this function is so small that I’m not entirely sure it’s
worth the effort to make it generic. But I'd be willing to give it a try
if we agree on it.

Thanks!

Best regards, Jim

Re: XMLDocument (SQL/XML X030)

От

Robert Treat

Дата:

28 января, 07:54:20

On Sat, Jan 25, 2025 at 6:45 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
> On 25.01.25 11:50, Pavel Stehule wrote:
> > you still forces detoasting (inside PG_GETARG_XML_P)
>
>
> Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P
>
> Fixed in v8.
>

Was playing around with the patch and was thinking about this wording:
"The xmldocument function returns the input argument
 unchanged... and is provided for compatibility."

When I run an example similar to the db2 example you gave earlier:

pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y));
    xmldocument
--------------------
 <x>10</x><y>20</y>

In the db2 case, this is preserved as UPPER (which is to say, db2 case
folds UPPER, and the input happens to match that), but we case fold
lower, because we do; presumably you'd get the opposite effect in db2
running the input with unquoted lower field names(?). In any case (no
pun intended), SQL folks probably don't care much about that
discrepancy, but given xml is case sensitive, maybe xml people do?

Robert Treat
https://xzilla.net

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

28 января, 11:14:48

Hi Robert

On 28.01.25 05:54, Robert Treat wrote:
> Was playing around with the patch and was thinking about this wording:
> "The xmldocument function returns the input argument
>  unchanged... and is provided for compatibility."
>
> When I run an example similar to the db2 example you gave earlier:
>
> pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y));
>     xmldocument
> --------------------
>  <x>10</x><y>20</y>
>
> In the db2 case, this is preserved as UPPER (which is to say, db2 case
> folds UPPER, and the input happens to match that), but we case fold
> lower, because we do; presumably you'd get the opposite effect in db2
> running the input with unquoted lower field names(?). 

Yes.

SELECT 42 AS foo FROM SYSIBM.SYSDUMMY1

FOO        
-----------
         42

  1 record(s) selected.

> In any case (no
> pun intended), SQL folks probably don't care much about that
> discrepancy, but given xml is case sensitive, maybe xml people do?

That's a good point. DB2 converts unquoted identifiers to uppercase by
default, which, if memory serves, aligns with the SQL standard. In the
case of this xmlforest example, my guess is that DB2 treats the elements
as identifiers and normalizes them to uppercase as well, as DB2 does not
handle XML documents as text like we do. To preserve case, you'd need to
explicitly quote the identifiers:

SELECT xmlforest(10 AS "x", 20 AS "y") FROM SYSIBM.SYSDUMMY1

--------------------
<x>10</x><y>20</y>

  1 record(s) selected.

Things look different when constructing the xml document directly from a
string:

SELECT xmlparse(DOCUMENT '<root><foo>bar</foo></root>') FROM
SYSIBM.SYSDUMMY1

----------------------------
<root><foo>bar</foo></root>

  1 record(s) selected.

I'd say the difference is due to how the two systems handle the XML data
type and unquoted identifiers in general, rather than a difference in
the behaviour of the function itself. Sticking to quoted identifiers in
both systems helps:

SELECT xmlforest(42 AS "foo", 73 AS "bar");
         xmlforest          
----------------------------

 <foo>42</foo><bar>73</bar>

Probably that's why most DB2 examples in their documentation use quoted
identifiers :)

Best regards, Jim

Re: XMLDocument (SQL/XML X030)

От

Chapman Flack

Дата:

28 января, 17:54:51

On 01/28/25 03:14, Jim Jones wrote:
> I'd say the difference is due to how the two systems handle the XML data
> type and unquoted identifiers in general, rather than a difference in
> the behaviour of the function itself. 

I'd go further and say it's entirely down to how the two systems
handle unquoted identifiers. In neither case was there ever any
XML value created with XML names in one case and then changed to
the other. The SQL names were already in their (DB2- or PostgreSQL-
specific) folded form by the first moment any XML library code ever
saw them. The XML code handled them faithfully ever after, whether
in serialized or in node-tree form.

Presumably both DB2 and PostgreSQL users soon know in their sleep
what their respective systems do to unquoted identifiers, and know
that quoting is the way to control that when it matters.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

От

Jim Jones

Дата:

07 августа, 11:41:24

rebased due to recent changes on doc/src/sgml/func.sgml

-- 
Jim

Вложения

v10-0001-Add-XMLDocument-function-SQL-XML-X030.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: XMLDocument (SQL/XML X030)

Вложения