Обсуждение: XMLDocument (SQL/XML X030)
Hi, I'd like to propose the implementation of XMLDocument (SQL/XML X030).It basically returns an XML document from a given XML expression, e.g. SELECT xmldocument( xmlelement(NAME foo, xmlattributes(42 AS att), xmlelement(NAME bar, xmlconcat('va', 'lue')) ) ); xmldocument -------------------------------------- <foo att="42"><bar>value</bar></foo> (1 row) XMLDocument doesn't do much. In fact, it might be reduced to a simple xmlparse() call as XMLOPTION_DOCUMENT... xmlparse(data, XMLOPTION_DOCUMENT, true) ... to make sure that the given XML expression is a valid document - still need some more research there. One could argue that XMLDocument() is in most cases unnecessary, but I believe it would facilitate the migration of scripts from other database products. Any thoughts? Best, Jim
On 2024-12-10 Tu 2:48 AM, Jim Jones wrote: > On 04.12.24 17:18, Jim Jones wrote: >> I'd like to propose the implementation of XMLDocument (SQL/XML X030). >> It basically returns an XML document from a given XML expression, e.g. >> >> SELECT >> xmldocument( >> xmlelement(NAME foo, >> xmlattributes(42 AS att), >> xmlelement(NAME bar, >> xmlconcat('va', 'lue')) >> ) >> ); >> >> xmldocument >> -------------------------------------- >> <foo att="42"><bar>value</bar></foo> >> (1 row) > v1 attached attempts to implement XMLDocument() as described above. > > Feedback welcome. > LGTM at a first glance. Please add this to the next CommitFest if you haven't done already. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi Andrew On 10.12.24 14:59, Andrew Dunstan wrote: > LGTM at a first glance. > > > Please add this to the next CommitFest if you haven't done already. Thanks! This is the CF entry: https://commitfest.postgresql.org/51/5431/ Best, Jim
Hi
út 14. 1. 2025 v 8:11 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
Hi,
I'd like to propose the implementation of XMLDocument (SQL/XML X030).It
basically returns an XML document from a given XML expression, e.g.
SELECT
xmldocument(
xmlelement(NAME foo,
xmlattributes(42 AS att),
xmlelement(NAME bar,
xmlconcat('va', 'lue'))
)
);
xmldocument
--------------------------------------
<foo att="42"><bar>value</bar></foo>
(1 row)
XMLDocument doesn't do much. In fact, it might be reduced to a simple
xmlparse() call as XMLOPTION_DOCUMENT...
xmlparse(data, XMLOPTION_DOCUMENT, true)
... to make sure that the given XML expression is a valid document -
still need some more research there. One could argue that XMLDocument()
is in most cases unnecessary, but I believe it would facilitate the
migration of scripts from other database products.
Any thoughts?
1. Oracle doesn't support this
2. DB2 has different implementations for z/OS (variadic) and for unix (nonvariadic)
3. looks so db2 allows some concatenation of xml content when xmlexpr is not the document already (not tested)
4. Your implementation just raise an exception
I didn't find a free downloadable SQL/XML standard with description of XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent and it is different from your implementation.
So the argument of better compatibility for this patch doesn't look too strong. But I found that the usage of XMLDOCUMENT is required for storing XML, so it can be a frequently used function. Unfortunately, I do not have any knowledge about db2. It is hard to understand the usage of this function, because the sense is probably different than in DB2, and the documentation doesn't explain well an usage and motivation for this function. If it does a check, then it is not described in doc.
Regards
Pavel
Best, Jim
Hi Pavel On 14.01.25 09:14, Pavel Stehule wrote: > I did some research and the design of this document is different > > 1. Oracle doesn't support this > 2. DB2 has different implementations for z/OS (variadic) and for unix > (nonvariadic) > 3. looks so db2 allows some concatenation of xml content when xmlexpr > is not the document already (not tested) > 4. Your implementation just raise an exception I'm not entirely sure I follow. XMLDOCUMENT is designed to produce a well-formed XML document, and according to the XML specification,a well-formed document must have precisely one root element. SELECT xmlserialize(DOCUMENT xmldocument( xmlelement(NAME root, xmlattributes(42 AS att), xmlcomment('comment'), xmlelement(NAME foo,'<foo&bar>'), xmlelement(NAME bar, xmlconcat('va', 'lue')), xmlpi(name pi), xmlelement(NAME txt, xmltext('<"&>')) )) AS text INDENT) ; xmlserialize ---------------------------------- <root att="42"> + <!--comment--> + <foo><foo&bar></foo>+ <bar>value</bar> + <?pi?> + <txt><"&></txt> + </root> (1 row) Could you provide an example of this feature you're missing? Malformed CONTENT xml strings will indeed raise an exception. SELECT xmldocument('foo'::xml); ERROR: invalid XML document DETAIL: line 1: Start tag expected, '<' not found foo ^ > > I didn't find a free downloadable SQL/XML standard with description of > XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent > and it is different from your implementation. The main idea is to ensure that an xml string is a valid document (even in CONTENT mode) postgres=# SET xmloption TO DOCUMENT; SET postgres=# SELECT 'foo'::xml; ERROR: invalid XML document LINE 1: SELECT 'foo'::xml; ^ DETAIL: line 1: Start tag expected, '<' not found foo ^ postgres=# SET xmloption TO CONTENT; SET postgres=# SELECT 'foo'::xml; xml ----- foo (1 row) postgres=# SELECT xmldocument('foo'::xml); ERROR: invalid XML document DETAIL: line 1: Start tag expected, '<' not found foo ^ > So the argument of better compatibility for this patch doesn't look > too strong. But I found that the usage of XMLDOCUMENT is required for > storing XML, so it can be a frequently used function. Unfortunately, > I do not have any knowledge about db2. It is hard to understand the > usage of this function, because the sense is probably different than > in DB2, and the documentation doesn't explain well an usage and > motivation for this function. If it does a check, then it is not > described in doc. > Perhaps changing the documentation like this would make things clearer? "The xmldocument function encapsulates the XML expression within a valid XML document structure. The expression passed asthe argument must be a valid, single-rooted XML fragment. If the XML expression is NULL, the result will also be NULL." Many thanks for the review! Best, Jim
st 15. 1. 2025 v 22:05 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
Hi Pavel
On 14.01.25 09:14, Pavel Stehule wrote:
> I did some research and the design of this document is different
>
> 1. Oracle doesn't support this
> 2. DB2 has different implementations for z/OS (variadic) and for unix
> (nonvariadic)
> 3. looks so db2 allows some concatenation of xml content when xmlexpr
> is not the document already (not tested)
> 4. Your implementation just raise an exception
I'm not entirely sure I follow. XMLDOCUMENT is designed to produce a well-formed XML document, and according to the XML specification, a well-formed document must have precisely one root element.
SELECT
xmlserialize(DOCUMENT
xmldocument(
xmlelement(NAME root,
xmlattributes(42 AS att),
xmlcomment('comment'),
xmlelement(NAME foo,'<foo&bar>'),
xmlelement(NAME bar,
xmlconcat('va', 'lue')),
xmlpi(name pi),
xmlelement(NAME txt, xmltext('<"&>'))
)) AS text INDENT) ;
xmlserialize
----------------------------------
<root att="42"> +
<!--comment--> +
<foo><foo&bar></foo>+
<bar>value</bar> +
<?pi?> +
<txt><"&></txt> +
</root>
(1 row)
Could you provide an example of this feature you're missing?
Malformed CONTENT xml strings will indeed raise an exception.
SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^
>
> I didn't find a free downloadable SQL/XML standard with description of
> XMLDOCUMENT so I read just the DB2 doc, but it isn't fully consistent
> and it is different from your implementation.
The main idea is to ensure that an xml string is a valid document (even
in CONTENT mode)
postgres=# SET xmloption TO DOCUMENT;
SET
postgres=# SELECT 'foo'::xml;
ERROR: invalid XML document
LINE 1: SELECT 'foo'::xml;
^
DETAIL: line 1: Start tag expected, '<' not found
foo
^
postgres=# SET xmloption TO CONTENT;
SET
postgres=# SELECT 'foo'::xml;
xml
-----
foo
(1 row)
postgres=# SELECT xmldocument('foo'::xml);
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
foo
^
> So the argument of better compatibility for this patch doesn't look
> too strong. But I found that the usage of XMLDOCUMENT is required for
> storing XML, so it can be a frequently used function. Unfortunately,
> I do not have any knowledge about db2. It is hard to understand the
> usage of this function, because the sense is probably different than
> in DB2, and the documentation doesn't explain well an usage and
> motivation for this function. If it does a check, then it is not
> described in doc.
>
Perhaps changing the documentation like this would make things clearer?
"The xmldocument function encapsulates the XML expression within a valid XML document structure. The expression passed as the argument must be a valid, single-rooted XML fragment. If the XML expression is NULL, the result will also be NULL."
It is better.
My note was related to a very different description of this functionality in DB2. So if you propose this function for better compatibility (and this function is implemented only by db2), it is surprising to see that this functionality is described (and probably implemented) very differently. Because I do not have db2 and I miss db2 knowledge, I don't know if differences in implementation and description are based on different technology (XML like graph or XML like string) or if it is something that is missing in this patch.
Regards
Pavel
Many thanks for the review!
Best, Jim
On 16.01.25 08:21, Jim Jones wrote: > On 16.01.25 07:11, Pavel Stehule wrote: >> It is better. > v2 attached updates the documentation. >> My note was related to a very different description of this >> functionality in DB2. So if you propose this function for better >> compatibility (and this function is implemented only by db2), it is >> surprising to see that this functionality is described (and probably >> implemented) very differently. Because I do not have db2 and I miss >> db2 knowledge, I don't know if differences in implementation and >> description are based on different technology (XML like graph or XML >> like string) or if it is something that is missing in this patch. > I suppose it's mostly because PostgreSQL and DB2 have different > structures for the XML data type; DB2 stores it in its native > hierarchical format rather than as text. > > Thanks for the review. > > Best, Jim The DB2 "Document node constructors" might provide some insights into its behavior regarding well-formed XML documents [1]: "No validation is performed on the constructed document node. The XQuery document node constructor does not enforce the XML 1.0 rules that govern the structure of an XML document. For example, a document node is not required to have exactly one child that is an element node." This suggests that DB2's design reflects a different approach to handling XML, focusing less on enforcing XML 1.0 constraints. It appears to be more of a design philosophy regarding how XML is integrated into the database system as a whole, rather than just a difference in the implementation of the XMLDocument function. PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in functions that require it. Since RETURNING CONTENT implies that the output is a well-formed XML document (e.g., single-rooted), I would argue that the behavior of this patch is more intuitive and aligns best with the expectations of XML document structure. Any thoughts? Best, Jim 1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node 2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html
On 01/20/25 06:02, Jim Jones wrote: > The DB2 "Document node constructors" might provide some insights into > its behavior regarding well-formed XML documents [1]: > > "No validation is performed on the constructed document node. The XQuery > document node constructor does not enforce the XML 1.0 rules that govern > the structure of an XML document. For example, a document node is not > required to have exactly one child that is an element node." > > This suggests that DB2's design reflects a different approach to > handling XML, focusing less on enforcing XML 1.0 constraints. It appears > to be more of a design philosophy regarding how XML is integrated into > the database system as a whole, rather than just a difference in the > implementation of the XMLDocument function. Indeed. ISO SQL/XML changed significantly between the 2003 edition (largely followed by PostgreSQL) and the 2006 and all later editions. There's a rundown of those changes at [3]. > PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT > clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in > functions that require it. Since RETURNING CONTENT implies that the > output is a well-formed XML document (e.g., single-rooted), In fact, you can't infer single-root-element-ness from RETURNING CONTENT, according to the standard. Single-root-element-ness is checked by the IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor. That amounts to a bit of unfortunate punning on the word DOCUMENT, but so help me that's what's in the standard. It may help to think in terms of the hierarchy of XML types that the 2006 standard introduced (cribbed here from [3]): SEQUENCE | (?sequence of length 1, a document node) | CONTENT(ANY)----------------.----------------(?every element | | conforms to a (?every element has (?no extraneous schema) xdt:untyped and !nilled, nodes) | every attribute has | | xdt:untypedAtomic) DOCUMENT(ANY) CONTENT(XMLSCHEMA) | | CONTENT(UNTYPED) (?whole thing is valid | according to schema) (?no extraneous nodes) | | DOCUMENT(XMLSCHEMA) DOCUMENT(UNTYPED) where the condition (?no extraneous nodes) is shorthand for SQL/XML's more precise "whose `children` property has exactly one XQuery element node, zero or more XQuery comment nodes, and zero or more XQuery processing instruction nodes". So that (?no extraneous nodes) condition is required for any of the XML(DOCUMENT...) types. When you relax that condition, you have an XML(CONTENT...) type. The XMLDOCUMENT constructor is so named because it constructs what corresponds to an XQuery document node—which actually corresponds to the XML(CONTENT...) SQL/XML types, and does not enforce having a single root element: "This data model is more permissive: a Document Node may be empty, it may have more than one Element Node as a child, and it also permits Text Nodes as children."[4] So in terms of the SQL/XML type hierarchy, what you get back from XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...) types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the implementation). If you then want to know if it is single-rooted, you can apply the IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type. (And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a value of type XML(SEQUENCE). The sequence has length 1, a document node, making it safely castable to XML(CONTENT(ANY)), but whether you can cast it to an XML(DOCUMENT...) type will depend on what children that document node has.) Long story short, an XMLDOCUMENT constructor that enforced having a single root element would be nonconformant. Regards, -Chap > 1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node > 2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html 3 - https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006 4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode
Hi Chap, Thanks for the thorough explanation! On 20.01.25 20:09, Chapman Flack wrote: >> PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT >> clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in >> functions that require it. Since RETURNING CONTENT implies that the >> output is a well-formed XML document (e.g., single-rooted), > In fact, you can't infer single-root-element-ness from RETURNING CONTENT, > according to the standard. Single-root-element-ness is checked by the > IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify > DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor. > > That amounts to a bit of unfortunate punning on the word DOCUMENT, > but so help me that's what's in the standard. Yeah, the term DOCUMENT seems a bit misleading in this context. > > It may help to think in terms of the hierarchy of XML types that the > 2006 standard introduced (cribbed here from [3]): > > SEQUENCE > | > (?sequence of length 1, a document node) > | > CONTENT(ANY)----------------.----------------(?every element > | | conforms to a > (?every element has (?no extraneous schema) > xdt:untyped and !nilled, nodes) | > every attribute has | | > xdt:untypedAtomic) DOCUMENT(ANY) CONTENT(XMLSCHEMA) > | | > CONTENT(UNTYPED) (?whole thing is valid > | according to schema) > (?no extraneous nodes) | > | DOCUMENT(XMLSCHEMA) > DOCUMENT(UNTYPED) > > where the condition (?no extraneous nodes) is shorthand for SQL/XML's > more precise "whose `children` property has exactly one XQuery element > node, zero or more XQuery comment nodes, and zero or more XQuery > processing instruction nodes". > > So that (?no extraneous nodes) condition is required for any of > the XML(DOCUMENT...) types. When you relax that condition, you have > an XML(CONTENT...) type. > > The XMLDOCUMENT constructor is so named because it constructs what > corresponds to an XQuery document node—which actually corresponds to > the XML(CONTENT...) SQL/XML types, and does not enforce having a > single root element: > > "This data model is more permissive: a Document Node may be empty, > it may have more than one Element Node as a child, and it also > permits Text Nodes as children."[4] Thanks a lot for pointing that out! I guess it's clear now. > > So in terms of the SQL/XML type hierarchy, what you get back from > XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...) > types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the > implementation). > > If you then want to know if it is single-rooted, you can apply the > IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type. > > (And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a > value of type XML(SEQUENCE). The sequence has length 1, a document > node, making it safely castable to XML(CONTENT(ANY)), but whether > you can cast it to an XML(DOCUMENT...) type will depend on what > children that document node has.) > > Long story short, an XMLDOCUMENT constructor that enforced having > a single root element would be nonconformant. > If I understand correctly, the compliant approach would be to always treat the input expression as CONTENT: |PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));| Is that right?" > >> 1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node >> 2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html > 3 - > https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006 > 4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode > Best, Jim
On 2025-01-20 Mo 2:09 PM, Chapman Flack wrote: > > Long story short, an XMLDOCUMENT constructor that enforced having > a single root element would be nonconformant. > What an (expletive deleted) mess. I'm glad someone is on top of it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 01/20/25 14:56, Jim Jones wrote: > If I understand correctly, the compliant approach would be to always > treat the input expression as CONTENT: > > |PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));| > > Is that right?" Did you mean XMLOPTION_CONTENT? I think that would be right ... which, to be honest, leaves me in doubt that there is any substantive work at all for an XMLDOCUMENT constructor in PostgreSQL to do. Possibly that's why it wasn't provided originally. We don't have the XML(SEQUENCE) type, and our single XML type essentially corresponds to XML(CONTENT(ANY)). Any value that has already been accepted by the input function should already have the same serialized form you'd expect from XMLDOCUMENT. We don't have BY REF and passing XML node trees around, so there's really no way to notice that you've applied XMLDOCUMENT to wrap stuff in a 'document node' that isn't visible in the serialized form that is all we store. Likewise, I'm not thinking of any way to combine our other XML constructors / aggregates to produce a result that isn't already XML(CONTENT(ANY)). (If I could think of a way, I think it would mean those weren't well-specified.) Therefore I'm thinking that, given the specifics of our XML support, a fully conformant and efficient XMLDOCUMENT could be implemented just by returning its XML argument. That opens a question of whether it's worth the effort to supply it at all. Maybe it could reduce the surprise for people coming from another DBMS and finding it missing, and/or be a placeholder in case we ever implement enough more of the newer SQL/XML standard for it to have a real effect. (In a DBMS implementing the whole XML type hierarchy, you could have an XML(SEQUENCE) containing a single element node, which you might think you could easily cast to XML(CONTENT(ANY)) or even XML(DOCUMENT(ANY)), but the cast would raise 2200U or 2200L, respectively: the sequence would first need to be transformed into a sequence of a single document node wrapping the original sequence, which is exactly what you'd use XMLDOCUMENT to do. And of course you could put RETURNING CONTENT on that XMLDOCUMENT to accomplish the cast in the same step.) Regards, -Chap
On Tue, Jan 21, 2025 at 5:58 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > On 20.01.25 23:21, Chapman Flack wrote: > > Therefore I'm thinking that, given the specifics of our XML support, > > a fully conformant and efficient XMLDOCUMENT could be implemented > > just by returning its XML argument. > > > After your explanation, I tend to agree. > > v3, attached, incorporates these changes and updates the regression > tests accordingly. > > > > > > That opens a question of whether it's worth the effort to supply > > it at all. Maybe it could reduce the surprise for people coming from > > another DBMS and finding it missing, and/or be a placeholder in case > > we ever implement enough more of the newer SQL/XML standard for it > > to have a real effect. > > Although quite trivial, I believe this function could still be valuable > in facilitating the migration of scripts from other database systems -- > improving SQL/XML conformance also isn't a bad thing :). > Is there some concrete use case you have seen that this would help with? Not objecting to adding it, but you've mentioned this migration idea twice but it seems to me this doesn't conform with existing implementations, and I don't see much benefit in migration use cases specifically, so I'm just curious if I am overlooking something? Robert Treat https://xzilla.net
On 21.01.25 23:45, Robert Treat wrote: > Is there some concrete use case you have seen that this would help > with? Not objecting to adding it, but you've mentioned this migration > idea twice but it seems to me this doesn't conform with existing > implementations, and I don't see much benefit in migration use cases > specifically, so I'm just curious if I am overlooking something? I wouldn’t frame it as a "migration idea". My point is that this would be one less function to modify when migrating a script from another database system to PostgreSQL. For example, in DB2: SELECT xmldocument(xmlforest(10 as x, 20 as y)) FROM SYSIBM.SYSDUMMY1; ---------------------------------------------------------- <X>10</X><Y>20</Y> 1 record(s) selected. ... This same query wouldn’t work in PostgreSQL, but the function makes the query more compatible. SELECT xmldocument(xmlforest(10 as x, 20 as y)); xmldocument -------------------- <x>10</x><y>20</y> (1 row) Which compatibility issues with existing implementations are you referring to? Thanks! Jim
On Tue, Jan 21, 2025 at 6:36 PM Jim Jones <jim.jones@uni-muenster.de> wrote: > On 21.01.25 23:45, Robert Treat wrote: > > Is there some concrete use case you have seen that this would help > > with? Not objecting to adding it, but you've mentioned this migration > > idea twice but it seems to me this doesn't conform with existing > > implementations, and I don't see much benefit in migration use cases > > specifically, so I'm just curious if I am overlooking something? > > I wouldn’t frame it as a "migration idea". My point is that this would > be one less function to modify when migrating a script from another > database system to PostgreSQL. > <snip> > > Which compatibility issues with existing implementations are you > referring to? > I'm mostly referring back to Pavel's statements: | 1. Oracle doesn't support this | 2. DB2 has different implementations for z/OS (variadic) and for unix (nonvariadic) With an added #3 which is AFAIK sql server doesn't implement this either (I believe they implement something similar with sp_xml_preparedocument) So even if we are following the spec (which I think technically we may not be), if no other database implements it the way we are, then it really isn't one less function people will have to modify. Which again, I'm not trying to argue against having it, I'm just trying to figure out if there is some specific group that would benefit from it. Robert Treat https://xzilla.net
On 01/22/25 13:41, Robert Treat wrote: > So even if we are following the spec (which I think technically we may > not be), There are definite ways in which we're not following the SQL/XML spec, which we document in an appendix[1]. The one that matters here is that we just have a single XML type instead of the hierarchy of them in the spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)). With that divergence from the spec understood, I don't see any new divergence in providing an XMLDOCUMENT function that just returns its argument. That's the correct result to return for anything that's a valid value of our XML type to begin with. > if no other database implements it the way we are There may be other systems that don't implement it at all, for which case I don't see any compatibility issue created because we have it and they do not. There may be systems that implement the SQL/XML type hierarchy more completely than we do, so that it would be possible for their XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with RETURNING SEQUENCE, both of which are things that can't happen in PostgreSQL. I don't see a problem in that either, as long as theirs produces results matching ours in the RETURNING CONTENT, passed an XML(CONTENT...) argument case. If another system produces results that differ, in that restricted domain corresponding to ours, I'd say something's nonconformant in that implementation. In my opinion, that would only be a problem for us if the system in question is an 800 lb gorilla and has many users relying on the differing behavior. Regarding the patch itself: I wonder if there is already, somewhere in the code base, a generic fmgr identity function that the pg_proc entry could point to. Or is there a strong project tradition in favor of writing a dedicated new one to use here? I'm not sure it's critical to have a version that tests USE_LIBXML or reports it's unsupported, because without support I doubt there's any way to pass it a non-null XML argument, and if declared STRICT it won't be called for a null one anyway. Regards, -Chap [1] https://www.postgresql.org/docs/17/xml-limits-conformance.html
Hi Chapman & Robert Many thanks for the input On 22.01.25 22:35, Chapman Flack wrote: > On 01/22/25 13:41, Robert Treat wrote: >> So even if we are following the spec (which I think technically we may >> not be), > There are definite ways in which we're not following the SQL/XML spec, > which we document in an appendix[1]. The one that matters here is that > we just have a single XML type instead of the hierarchy of them in the > spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)). > > With that divergence from the spec understood, I don't see any new > divergence in providing an XMLDOCUMENT function that just returns > its argument. That's the correct result to return for anything that's > a valid value of our XML type to begin with. > >> if no other database implements it the way we are > There may be other systems that don't implement it at all, for which > case I don't see any compatibility issue created because we have it > and they do not. Yeah, as long as we stick to the specification, I don’t see any issue with including it. > > There may be systems that implement the SQL/XML type hierarchy more > completely than we do, so that it would be possible for their > XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with > RETURNING SEQUENCE, both of which are things that can't happen > in PostgreSQL. I don't see a problem in that either, as long as > theirs produces results matching ours in the RETURNING CONTENT, > passed an XML(CONTENT...) argument case. > > If another system produces results that differ, in that restricted > domain corresponding to ours, I'd say something's nonconformant in > that implementation. In my opinion, that would only be a problem > for us if the system in question is an 800 lb gorilla and has many > users relying on the differing behavior. > > Regarding the patch itself: I wonder if there is already, somewhere > in the code base, a generic fmgr identity function that the pg_proc > entry could point to. Or is there a strong project tradition in favor > of writing a dedicated new one to use here? I'm not sure it's critical > to have a version that tests USE_LIBXML or reports it's unsupported, > because without support I doubt there's any way to pass it a non-null > XML argument, and if declared STRICT it won't be called for a null one > anyway. > > Regards, > -Chap > > [1] https://www.postgresql.org/docs/17/xml-limits-conformance.html Regarding compatibility, here is an example of how both implementations handle single-rooted XML, non-single-rooted XML, and NULL values. == DB2 [1] == WITH t(x) AS ( VALUES (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')), (xmlforest(42 AS foo, 73 AS bar)), (NULL) ) SELECT xmldocument(x) FROM t; ---------------------------- <root><foo>bar</foo></root> <FOO>42</FOO><BAR>73</BAR> - 3 record(s) selected. == PostgreSQL == WITH t(x) AS ( VALUES (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')), (xmlforest(42 AS foo, 73 AS bar)), (NULL) ) SELECT xmldocument(x) FROM t; xmldocument ----------------------------- <root><foo>bar</foo></root> <foo>42</foo><bar>73</bar> (3 rows) To make it clear: ensuring this function is compatible with other database products is IMHO beneficial (and is my primary motivation), but it shouldn't come at the expense of violating the SQL/XML specification. I mean, if in some edge case, another database system implemented XMLDocument in a way that deviates from the standard, I'd argue it’s not worth prioritizing compatibility -- assuming, of course, that we are fully following the standard. Best regards, Jim 1 - https://dbfiddle.uk/G9VoHKp7
čt 23. 1. 2025 v 0:55 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
Hi Chapman & Robert
Many thanks for the input
On 22.01.25 22:35, Chapman Flack wrote:
> On 01/22/25 13:41, Robert Treat wrote:
>> So even if we are following the spec (which I think technically we may
>> not be),
> There are definite ways in which we're not following the SQL/XML spec,
> which we document in an appendix[1]. The one that matters here is that
> we just have a single XML type instead of the hierarchy of them in the
> spec, and ours corresponds to what the spec calls XML(CONTENT(ANY)).
>
> With that divergence from the spec understood, I don't see any new
> divergence in providing an XMLDOCUMENT function that just returns
> its argument. That's the correct result to return for anything that's
> a valid value of our XML type to begin with.
>
>> if no other database implements it the way we are
> There may be other systems that don't implement it at all, for which
> case I don't see any compatibility issue created because we have it
> and they do not.
Yeah, as long as we stick to the specification, I don’t see any issue
with including it.
>
> There may be systems that implement the SQL/XML type hierarchy more
> completely than we do, so that it would be possible for their
> XMLDOCUMENT to be called with an XML(SEQUENCE) argument, or with
> RETURNING SEQUENCE, both of which are things that can't happen
> in PostgreSQL. I don't see a problem in that either, as long as
> theirs produces results matching ours in the RETURNING CONTENT,
> passed an XML(CONTENT...) argument case.
>
> If another system produces results that differ, in that restricted
> domain corresponding to ours, I'd say something's nonconformant in
> that implementation. In my opinion, that would only be a problem
> for us if the system in question is an 800 lb gorilla and has many
> users relying on the differing behavior.
>
> Regarding the patch itself: I wonder if there is already, somewhere
> in the code base, a generic fmgr identity function that the pg_proc
> entry could point to. Or is there a strong project tradition in favor
> of writing a dedicated new one to use here? I'm not sure it's critical
> to have a version that tests USE_LIBXML or reports it's unsupported,
> because without support I doubt there's any way to pass it a non-null
> XML argument, and if declared STRICT it won't be called for a null one
> anyway.
>
> Regards,
> -Chap
>
> [1] https://www.postgresql.org/docs/17/xml-limits-conformance.html
Regarding compatibility, here is an example of how both implementations
handle single-rooted XML, non-single-rooted XML, and NULL values.
== DB2 [1] ==
WITH t(x) AS (
VALUES
(xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
(xmlforest(42 AS foo, 73 AS bar)),
(NULL)
)
SELECT xmldocument(x) FROM t;
----------------------------
<root><foo>bar</foo></root>
<FOO>42</FOO><BAR>73</BAR>
I think so in this form (just forward input to output) I have no objection.
There is a benefit with a) possible zero work with migration from db2, b) nobody needs to repeat a work which is a correct implementation of XMLDOCUMENT function.
Maybe opened question can be implementation like classic scalar function or via XmlExpr
In this moment I prefer to use XmlExpr from consistency reasons
Regards
Pavel
-
3 record(s) selected.
== PostgreSQL ==
WITH t(x) AS (
VALUES
(xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
(xmlforest(42 AS foo, 73 AS bar)),
(NULL)
)
SELECT xmldocument(x) FROM t;
xmldocument
-----------------------------
<root><foo>bar</foo></root>
<foo>42</foo><bar>73</bar>
(3 rows)
To make it clear: ensuring this function is compatible with other
database products is IMHO beneficial (and is my primary motivation), but
it shouldn't come at the expense of violating the SQL/XML specification.
I mean, if in some edge case, another database system implemented
XMLDocument in a way that deviates from the standard, I'd argue it’s not
worth prioritizing compatibility -- assuming, of course, that we are
fully following the standard.
Best regards, Jim
1 - https://dbfiddle.uk/G9VoHKp7
Hi
čt 23. 1. 2025 v 16:06 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
Hi
On 23.01.25 07:50, Pavel Stehule wrote:
> I think so in this form (just forward input to output) I have no
> objection.
>
> There is a benefit with a) possible zero work with migration from db2,
> b) nobody needs to repeat a work which is a correct implementation of
> XMLDOCUMENT function.
>
> Maybe opened question can be implementation like classic scalar
> function or via XmlExpr
>
> In this moment I prefer to use XmlExpr from consistency reasons
To keep it consistent with the existing code, I think this function is
in the right place. There are similar functions in xml.c, e.g.
xmltotext, texttoxml.
These functions are cast functions - they should be V1 functions only - casting cannot work with pseudo functions.
But it is true, so xmlcomment is not a pseudo function either. So minimally this function is precedent, so there is not strong dependency on XmlExp, which I expected.
I updated the function comment and commit message (v4 attached) to make
things clearer.
Since the status of this patch is waiting on author, is there anything
else I should take a look / improve?
Thanks!
Best regards,Jim
čt 23. 1. 2025 v 16:06 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
Hi
On 23.01.25 07:50, Pavel Stehule wrote:
> I think so in this form (just forward input to output) I have no
> objection.
>
> There is a benefit with a) possible zero work with migration from db2,
> b) nobody needs to repeat a work which is a correct implementation of
> XMLDOCUMENT function.
>
> Maybe opened question can be implementation like classic scalar
> function or via XmlExpr
>
> In this moment I prefer to use XmlExpr from consistency reasons
To keep it consistent with the existing code, I think this function is
in the right place. There are similar functions in xml.c, e.g.
xmltotext, texttoxml.
I updated the function comment and commit message (v4 attached) to make
things clearer.
Since the status of this patch is waiting on author, is there anything
else I should take a look / improve?
I think documentation should be strongly enhanced. This is probably the hardest part of this patch - explain well what this function does and what it doesn't.
Regards
Pavel
Thanks!
Best regards,Jim
Hi On 24.01.25 07:28, Pavel Stehule wrote: > I think documentation should be strongly enhanced. This is probably > the hardest part of this patch - explain well what this function does > and what it doesn't. You mean something like this? Or perhaps something more technical? The <function>xmldocument</function> function returns a document node, representing an XML document, from the provided <type>xml</type> expression. The input expression can represent any valid XML content, including elements, text, or a sequence of nodes. If the <type>xml</type> expression is NULL, the function returns NULL. This function does not require the input to have a single root node or encapsulate the result in a root element. The validation of the <type>xml</type> expression depends on the current <xref linkend="guc-xmloption"/> setting. Example: WITH xmldata (val) AS ( VALUES (xmlparse(DOCUMENT '<root><foo>bar</foo></root>')), (xmltext('foo&bar')), (xmlelement(NAME el)), (xmlforest(42 AS foo, 73 AS bar)) ) SELECT xmldocument(val) FROM xmldata; xmldocument ----------------------------- <root><foo>bar</foo></root> foo&bar <el/> <foo>42</foo><bar>73</bar> (4 rows) Thanks! Best, Jim
pá 24. 1. 2025 v 9:12 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
Hi
On 24.01.25 07:28, Pavel Stehule wrote:
> I think documentation should be strongly enhanced. This is probably
> the hardest part of this patch - explain well what this function does
> and what it doesn't.
You mean something like this? Or perhaps something more technical?
I don't know what it means - `function returns a document node` in the context of Postgres implementation of XML.
I miss the information so it returns an input argument without changing anything, because in a system, where XML expression
holds a complete string (not graph like some other systems), it just does nothing.
If somebody looks the source code, then he should be in panic mode, because the doc talks about returning "document node",
and it is just an argument. So it should be explained more in PostgreSQL XML design, and less in SQL/XML description. Because
In this case, SQL/XML description (in postgresql context) is absolutely confusing.
Regards
Pavel
The <function>xmldocument</function> function returns a document node,
representing an XML document, from the provided <type>xml</type>
expression. The input expression can represent any valid XML content,
including elements, text, or a sequence of nodes. If the
<type>xml</type> expression is NULL, the function returns NULL. This
function does not require the input to have a single root node or
encapsulate the result in a root element. The validation of the
<type>xml</type> expression depends on the current <xref
linkend="guc-xmloption"/> setting.
Example:
WITH xmldata (val) AS (
VALUES
(xmlparse(DOCUMENT '<root><foo>bar</foo></root>')),
(xmltext('foo&bar')),
(xmlelement(NAME el)),
(xmlforest(42 AS foo, 73 AS bar))
)
SELECT xmldocument(val) FROM xmldata;
xmldocument
-----------------------------
<root><foo>bar</foo></root>
foo&bar
<el/>
<foo>42</foo><bar>73</bar>
(4 rows)
Thanks!
Best, Jim
On 24.01.25 12:31, Pavel Stehule wrote: > I don't know what it means - `function returns a document node` in the > context of Postgres implementation of XML. > I miss the information so it returns an input argument without > changing anything, because in a system, where XML expression > holds a complete string (not graph like some other systems), it just > does nothing. > > If somebody looks the source code, then he should be in panic mode, > because the doc talks about returning "document node", > and it is just an argument. So it should be explained more in > PostgreSQL XML design, and less in SQL/XML description. Because > In this case, SQL/XML description (in postgresql context) is > absolutely confusing. I see. I borrowed document node from the XML standard. Perhaps it is more palatable like this? The <function>xmldocument</function> function creates an XML value from the given <type>xml</type> expression. The input can include any valid XML content, such as elements, text, or a mixture of elements and text. If the <type>xml</type> expression is NULL, the function returns NULL. This function does not require the input to have a single root element and does not wrap the result in one - instead, it returns an exact XML representation of the input <type>xml</type> expression. The validation of the <type>xml</type> expression depends on the current <xref linkend="guc-xmloption"/> setting. Best, Jim
pá 24. 1. 2025 v 13:21 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 24.01.25 12:31, Pavel Stehule wrote:
> I don't know what it means - `function returns a document node` in the
> context of Postgres implementation of XML.
> I miss the information so it returns an input argument without
> changing anything, because in a system, where XML expression
> holds a complete string (not graph like some other systems), it just
> does nothing.
>
> If somebody looks the source code, then he should be in panic mode,
> because the doc talks about returning "document node",
> and it is just an argument. So it should be explained more in
> PostgreSQL XML design, and less in SQL/XML description. Because
> In this case, SQL/XML description (in postgresql context) is
> absolutely confusing.
I see. I borrowed document node from the XML standard.
Perhaps it is more palatable like this?
The <function>xmldocument</function> function creates an XML value from
the given <type>xml</type> expression. The input can include any valid
XML content, such as elements, text, or a mixture of elements and text.
If the <type>xml</type> expression is NULL, the function returns NULL.
This function does not require the input to have a single root element
and does not wrap the result in one - instead, it returns an exact XML
representation of the input <type>xml</type> expression. The validation
of the <type>xml</type> expression depends on the current <xref
linkend="guc-xmloption"/> setting.
It is still confusing (and I understand so it can be hard to describe a function that does nothing).
As a PostgreSQL user, and reader documentation, I have question
what is difference between XMLDOCUMENT( some xml expr) and just xml expr
Is there different result SELECT XMLDOCUMENT(XMLELEMENT(name a, 10)) and
SELECT XMLELEMENT(name a, 10) ?
Lot of users use XML functions without XMLDOCUMENT now. The doc should to
help with a reply to question where and when I need (I can) use this function. This should
be in context of Postgres where every XML expr returns XML value, so sentence
creates an XML value from XML expression is still very confusing
What some like
"SQL/XML introduces XMLDOCUMENT function to returns a document node from XML tree of some XML value.
This has not any sense (because XML in Postgres is always materialized to string) in PostgreSQL implementation of XML, and the function just returns
input value". Or some like that.
Best, Jim
On 24.01.25 13:48, Pavel Stehule wrote: > Lot of users use XML functions without XMLDOCUMENT now. The doc should to > help with a reply to question where and when I need (I can) use this > function. This should > be in context of Postgres where every XML expr returns XML value, so > sentence > creates an XML value from XML expression is still very confusing > > What some like > > "SQL/XML introduces XMLDOCUMENT function to returns a document node > from XML tree of some XML value. > This has not any sense (because XML in Postgres is always materialized > to string) in PostgreSQL implementation of XML, and the function just > returns > input value". Or some like that. I see. I tried to incorporate your suggestion in the text: In the SQL/XML specification, the XMLDocument (X030) function is designed to return a document node from a given XML value expression. Since PostgreSQL handles XML values as materialized strings rather than tree structures, this function does not produce a document node structure but instead returns the validated input <type>xml</type> value as-is. The input <type>xml</type> value can include any valid XML content, such as elements, text, or a mixture of both. This function does not require the input to have a single root element and does not wrap the result in one. Validation of the <type>xml</type> input value depends on the current <xref linkend="guc-xmloption"/> setting. If the input <type>xml</type> value is NULL, the function returns NULL. This function ensures compatibility with the SQL/XML specification, making it particularly useful when migrating SQL queries from other database systems that also implement this standard. Let me know if it is still confusing ... Thanks! Best, Jim
pá 24. 1. 2025 v 14:48 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 24.01.25 13:48, Pavel Stehule wrote:
> Lot of users use XML functions without XMLDOCUMENT now. The doc should to
> help with a reply to question where and when I need (I can) use this
> function. This should
> be in context of Postgres where every XML expr returns XML value, so
> sentence
> creates an XML value from XML expression is still very confusing
>
> What some like
>
> "SQL/XML introduces XMLDOCUMENT function to returns a document node
> from XML tree of some XML value.
> This has not any sense (because XML in Postgres is always materialized
> to string) in PostgreSQL implementation of XML, and the function just
> returns
> input value". Or some like that.
I see. I tried to incorporate your suggestion in the text:
In the SQL/XML specification, the XMLDocument (X030) function is
designed to return a document node from a given XML value expression.
Since PostgreSQL handles XML values as materialized strings rather than
tree structures, this function does not produce a document node
structure but instead returns the validated input <type>xml</type> value
as-is. The input <type>xml</type> value can include any valid XML
content, such as elements, text, or a mixture of both. This function
does not require the input to have a single root element and does not
wrap the result in one. Validation of the <type>xml</type> input value
depends on the current <xref linkend="guc-xmloption"/> setting. If the
input <type>xml</type> value is NULL, the function returns NULL. This
function ensures compatibility with the SQL/XML specification, making it
particularly useful when migrating SQL queries from other database
systems that also implement this standard.
Let me know if it is still confusing ...
it is ok for me now. Thank you
Pavel
Thanks!
Best, Jim
On 01/24/25 08:48, Jim Jones wrote: > In the SQL/XML specification, the XMLDocument (X030) function is > designed to return a document node from a given XML value expression. Maybe we can take advantage of the way that specifications usually don't mandate an implementation, but only results equivalent to an implementation. So we could say something like: The function `xmldocument` returns the input argument unchanged, null if the argument is null, and is supplied for compatibility. The SQL-standard `XMLDOCUMENT` function applied to an XML value /expr/ has effects equivalent to the XML Query expression `document { /expr/ }`, specified to replace any document nodes in the input with their children and wrap the whole result in one document node. An XML Query "document node" is a relaxed version of XML document structure, which need not have exactly one child element node, and also allows text nodes as children. Some systems support a family of XML data types including `XML(SEQUENCE)`, which can hold values that do not have such a structure. In PostgreSQL, every valid non-null value of the single `XML` type already has that structure, and so is returned by this function unchanged. Regards, -Chap
On 01/24/25 10:49, Chapman Flack wrote: > The SQL-standard `XMLDOCUMENT` function applied to an XML value > /expr/ has effects equivalent to the XML Query expression > `document { /expr/ }`, specified to replace any document nodes > in the input with their children and wrap the whole result in one > document node. An XML Query "document node" is a relaxed version > of XML document structure, which need not have exactly one child > element node, and also allows text nodes as children. Or even: ... An XML Query "document node" is a relaxed version of XML document structure that corresponds exactly to what PostgreSQL's one XML type is already allowed to contain, so any non-null PostgreSQL XML value can be returned unchanged. More-permissive XML types some systems offer may hold values that are not so structured. Regards, -Chap
On 01/24/25 14:59, Jim Jones wrote: > In the XML Query standard, a "document node" represents a relaxed > version of an XML document structure. This corresponds to what > PostgreSQL's single XML type allows, meaning that any valid non-null > PostgreSQL XML value can be returned unchanged. Other systems may > support more permissive XML data types, such as > <literal>XML(SEQUENCE)</literal>, which allow values that do not conform > to this structure. In PostgreSQL, every valid non-null value of the XML > type already has that structure, making any additional processing by > this function unnecessary. My one question here would be, does the reader of that paragraph sit back and wonder "why have we suddenly begun talking about the XML Query standard?". It seems to me the key connection there is that the ISO SQL standard defines XMLDOCUMENT by equivalence to what `document { $expr }` means in the W3 XML Query standard. Once that connection is made, it is all smooth sailing. Regards, -Chap
pá 24. 1. 2025 v 23:11 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 24.01.25 22:01, Chapman Flack wrote:
> It seems to me the key connection there is that the ISO SQL standard
> defines XMLDOCUMENT by equivalence to what `document { $expr }` means
> in the W3 XML Query standard.
It seems I missed one sentence. My bad.
Next try... :)
The <function>xmldocument</function> function returns the input argument
unchanged, or <literal>NULL</literal> if the argument is
<literal>NULL</literal>, and is provided for compatibility.
The SQL-standard <replaceable>XMLDocument</replaceable> function applied
to an XML value <literal>$EXPR</literal>, has effects equivalent to the
XML Query expression <replaceable>document { $EXPR }</replaceable>. It
replaces any document nodes in the input with their children and wraps
the whole result in a single <replaceable>document node</replaceable>.
In the XML Query standard, a <replaceable>document node</replaceable>
represents a relaxed version of an XML document structure. This
corresponds to what PostgreSQL's single XML type allows, meaning that
any valid non-null PostgreSQL XML value can be returned unchanged. Other
systems may support more permissive XML data types, such as
<literal>XML(SEQUENCE)</literal>, which allow values that do not conform
to this structure. In PostgreSQL, every valid non-null value of the XML
type already has that structure, making additional processing by this
function unnecessary.
v6 attached.
I think so doc is ok now
because the function does nothing, then it is useless to convert input to XML and force detosting
Maybe the body of the function should be just
{
#ifdef USE_LIBXML
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
#else
NO_XML_SUPPORT();
return 0;
#endif
}
Regards
Pavel
Thanks.
Best regards, Jim
so 25. 1. 2025 v 9:10 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 25.01.25 08:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input
> to XML and force detosting
Right. Fixed in v7 attached.
I don't see
+Datum
+xmldocument(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+ PG_RETURN_XML_P(PG_GETARG_XML_P(0));
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif /* not USE_LIBXML */
+}
+
+xmldocument(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+ PG_RETURN_XML_P(PG_GETARG_XML_P(0));
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif /* not USE_LIBXML */
+}
+
you still forces detoasting (inside PG_GETARG_XML_P)
Thanks
Best, Jim
so 25. 1. 2025 v 12:45 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 25.01.25 11:50, Pavel Stehule wrote:
> you still forces detoasting (inside PG_GETARG_XML_P)
Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P
Fixed in v8.
now it is ok.
I'll mark this patch as ready for committer
Regards
Pavel
Thanks!
Best, Jim
On 01/25/25 02:16, Pavel Stehule wrote: > because the function does nothing, then it is useless to convert input to > XML and force detosting > > Maybe the body of the function should be just > ... > PG_RETURN_DATUM(PG_GETARG_DATUM(0)); That sort of motivated my question upthread about whether there is already a function somewhere in the codebase that does exactly that and could be named in the pg_proc entry for xmldocument, as an alternative to supplying a brand-new one. Maybe this is the only instance where it turns out that 'identity' is the right behavior for a function. But if it could conceivably happen again, a single C function (maybe even named identity) could reduce code duplication and make quite clear what the behavior is with a \sf. A generic 'identity' function would be lacking the #ifdef USE_LIBXML and the error message, but I'm not convinced those matter here anyway. Without XML support, you'll already have raised that error in any attempt to construct a non-null XML argument to pass, and if you're passing NULL and the function is strict, you'll never see the error message from here anyway. Regards, -Chap
so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal:
On 01/25/25 02:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input to
> XML and force detosting
>
> Maybe the body of the function should be just
> ...
> PG_RETURN_DATUM(PG_GETARG_DATUM(0));
That sort of motivated my question upthread about whether there is
already a function somewhere in the codebase that does exactly that
and could be named in the pg_proc entry for xmldocument, as an
alternative to supplying a brand-new one.
this should not be a problem, because the already created function XMLDOCUMENT surely will not be in the pg_catalog schema.
Maybe this is the only instance where it turns out that 'identity'
is the right behavior for a function. But if it could conceivably
happen again, a single C function (maybe even named identity) could
reduce code duplication and make quite clear what the behavior is
with a \sf.
I didn't find any function like this.
Regards
Pavel
A generic 'identity' function would be lacking the #ifdef USE_LIBXML
and the error message, but I'm not convinced those matter here
anyway. Without XML support, you'll already have raised that error
in any attempt to construct a non-null XML argument to pass, and if
you're passing NULL and the function is strict, you'll never see
the error message from here anyway.
Regards,
-Chap
Hi Chapman & Pavel On 25.01.25 16:05, Pavel Stehule wrote: > > > so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal: > > On 01/25/25 02:16, Pavel Stehule wrote: > > because the function does nothing, then it is useless to convert > input to > > XML and force detosting > > > > Maybe the body of the function should be just > > ... > > PG_RETURN_DATUM(PG_GETARG_DATUM(0)); > > That sort of motivated my question upthread about whether there is > already a function somewhere in the codebase that does exactly that > and could be named in the pg_proc entry for xmldocument, as an > alternative to supplying a brand-new one. > > > this should not be a problem, because the already created function > XMLDOCUMENT surely will not be in the pg_catalog schema. > > > > Maybe this is the only instance where it turns out that 'identity' > is the right behavior for a function. But if it could conceivably > happen again, a single C function (maybe even named identity) could > reduce code duplication and make quite clear what the behavior is > with a \sf. > > > I didn't find any function like this. I also couldn't find any similar function. > > Regards > > Pavel > > > A generic 'identity' function would be lacking the #ifdef USE_LIBXML > and the error message, but I'm not convinced those matter here > anyway. Without XML support, you'll already have raised that error > in any attempt to construct a non-null XML argument to pass, and if > you're passing NULL and the function is strict, you'll never see > the error message from here anyway. > > Do you envision something like this? Datum identity(PG_FUNCTION_ARGS) { PG_RETURN_DATUM(PG_GETARG_DATUM(0)); } If so, where would be the right place to put it? Certainly not in xml.c Generally speaking, reducing redundancy by reusing existing functions is always a good thing. However, if we decide to create a generic function for this purpose, it's important to ensure its existence is clearly communicated to prevent others from writing their own -- something that, given the simplicity of this function, seems like a likely scenario. :) My point is: this function is so small that I’m not entirely sure it’s worth the effort to make it generic. But I'd be willing to give it a try if we agree on it. Thanks! Best regards, Jim
On Sat, Jan 25, 2025 at 6:45 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > On 25.01.25 11:50, Pavel Stehule wrote: > > you still forces detoasting (inside PG_GETARG_XML_P) > > > Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P > > Fixed in v8. > Was playing around with the patch and was thinking about this wording: "The xmldocument function returns the input argument unchanged... and is provided for compatibility." When I run an example similar to the db2 example you gave earlier: pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y)); xmldocument -------------------- <x>10</x><y>20</y> In the db2 case, this is preserved as UPPER (which is to say, db2 case folds UPPER, and the input happens to match that), but we case fold lower, because we do; presumably you'd get the opposite effect in db2 running the input with unquoted lower field names(?). In any case (no pun intended), SQL folks probably don't care much about that discrepancy, but given xml is case sensitive, maybe xml people do? Robert Treat https://xzilla.net
Hi Robert On 28.01.25 05:54, Robert Treat wrote: > Was playing around with the patch and was thinking about this wording: > "The xmldocument function returns the input argument > unchanged... and is provided for compatibility." > > When I run an example similar to the db2 example you gave earlier: > > pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y)); > xmldocument > -------------------- > <x>10</x><y>20</y> > > In the db2 case, this is preserved as UPPER (which is to say, db2 case > folds UPPER, and the input happens to match that), but we case fold > lower, because we do; presumably you'd get the opposite effect in db2 > running the input with unquoted lower field names(?). Yes. SELECT 42 AS foo FROM SYSIBM.SYSDUMMY1 FOO ----------- 42 1 record(s) selected. > In any case (no > pun intended), SQL folks probably don't care much about that > discrepancy, but given xml is case sensitive, maybe xml people do? That's a good point. DB2 converts unquoted identifiers to uppercase by default, which, if memory serves, aligns with the SQL standard. In the case of this xmlforest example, my guess is that DB2 treats the elements as identifiers and normalizes them to uppercase as well, as DB2 does not handle XML documents as text like we do. To preserve case, you'd need to explicitly quote the identifiers: SELECT xmlforest(10 AS "x", 20 AS "y") FROM SYSIBM.SYSDUMMY1 -------------------- <x>10</x><y>20</y> 1 record(s) selected. Things look different when constructing the xml document directly from a string: SELECT xmlparse(DOCUMENT '<root><foo>bar</foo></root>') FROM SYSIBM.SYSDUMMY1 ---------------------------- <root><foo>bar</foo></root> 1 record(s) selected. I'd say the difference is due to how the two systems handle the XML data type and unquoted identifiers in general, rather than a difference in the behaviour of the function itself. Sticking to quoted identifiers in both systems helps: SELECT xmlforest(42 AS "foo", 73 AS "bar"); xmlforest ---------------------------- <foo>42</foo><bar>73</bar> Probably that's why most DB2 examples in their documentation use quoted identifiers :) Best regards, Jim
On 01/28/25 03:14, Jim Jones wrote: > I'd say the difference is due to how the two systems handle the XML data > type and unquoted identifiers in general, rather than a difference in > the behaviour of the function itself. I'd go further and say it's entirely down to how the two systems handle unquoted identifiers. In neither case was there ever any XML value created with XML names in one case and then changed to the other. The SQL names were already in their (DB2- or PostgreSQL- specific) folded form by the first moment any XML library code ever saw them. The XML code handled them faithfully ever after, whether in serialized or in node-tree form. Presumably both DB2 and PostgreSQL users soon know in their sleep what their respective systems do to unquoted identifiers, and know that quoting is the way to control that when it matters. Regards, -Chap
rebased due to recent changes on doc/src/sgml/func.sgml -- Jim