[PoC] Add CANONICAL option to xmlserialize
От | Jim Jones |
---|---|
Тема | [PoC] Add CANONICAL option to xmlserialize |
Дата | |
Msg-id | 67fa8560-8d61-5d06-8178-fc9c7684db90@uni-muenster.de обсуждение исходный текст |
Ответы |
[PATCH] Add CANONICAL option to xmlserialize
|
Список | pgsql-hackers |
Hi, In order to compare pairs of XML documents for equivalence it is necessary to convert them first to their canonical form, as described at W3C Canonical XML 1.1.[1] This spec basically defines a standard physical representation of xml documents that have more then one possible representation, so that it is possible to compare them, e.g. forcing UTF-8 encoding, entity reference replacement, attributes normalization, etc. Although it is not part of the XML/SQL standard, it would be nice to have the option CANONICAL in xmlserialize. Additionally, we could also add the attribute WITH [NO] COMMENTS to keep or remove xml comments from the documents. Something like this: WITH t(col) AS ( VALUES ('<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE doc SYSTEM "doc.dtd" [ <!ENTITY val "42"> <!ATTLIST xyz attr CDATA "default"> ]> <!-- ordering of attributes --> <foo ns:c = "3" ns:b = "2" ns:a = "1" xmlns:ns="http://postgresql.org"> <!-- Normalization of whitespace in start and end tags --> <!-- Elimination of superfluous namespace declarations, as already declared in <foo> --> <bar xmlns:ns="http://postgresql.org" >&val;</bar > <!-- Empty element conversion to start-end tag pair --> <empty/> <!-- Effect of transcoding from a sample encoding to UTF-8 --> <iso8859>©</iso8859> <!-- Addition of default attribute --> <!-- Whitespace inside tag preserved --> <xyz> 321 </xyz> </foo> <!-- comment outside doc -->'::xml) ) SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t; xmlserialize -------------------------------------------------------------------------------------------------------------------------------------------------------- <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2" ns:c="3"><bar>42</bar><empty></empty><iso8859>©</iso8859><xyz attr="default"> 321 </xyz></foo> (1 row) -- using WITH COMMENTS WITH t(col) AS ( VALUES (' <foo ns:c = "3" ns:b = "2" ns:a = "1" xmlns:ns="http://postgresql.org"> <!-- very important comment --> <xyz> 321 </xyz> </foo>'::xml) ) SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t; xmlserialize ------------------------------------------------------------------------------------------------------------------------ <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2" ns:c="3"><!-- very important comment --><xyz> 321 </xyz></foo> (1 row) Another option would be to simply create a new function, e.g. xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this would be the right approach. Attached a very short draft. What do you think? Best, Jim 1- https://www.w3.org/TR/xml-c14n11/
Вложения
В списке pgsql-hackers по дате отправления: