[PATCH] Add CANONICAL option to xmlserialize
От | Jim Jones |
---|---|
Тема | [PATCH] Add CANONICAL option to xmlserialize |
Дата | |
Msg-id | 2f04079e-b237-1a2f-0682-9f60ca59805e@uni-muenster.de обсуждение исходный текст |
Ответ на | [PoC] Add CANONICAL option to xmlserialize (Jim Jones <jim.jones@uni-muenster.de>) |
Ответы |
Re: [PATCH] Add CANONICAL option to xmlserialize
|
Список | pgsql-hackers |
On 27.02.23 14:16, I wrote: > Hi, > > In order to compare pairs of XML documents for equivalence it is > necessary to convert them first to their canonical form, as described > at W3C Canonical XML 1.1.[1] This spec basically defines a standard > physical representation of xml documents that have more then one > possible representation, so that it is possible to compare them, e.g. > forcing UTF-8 encoding, entity reference replacement, attributes > normalization, etc. > > Although it is not part of the XML/SQL standard, it would be nice to > have the option CANONICAL in xmlserialize. Additionally, we could also > add the attribute WITH [NO] COMMENTS to keep or remove xml comments > from the documents. > > Something like this: > > WITH t(col) AS ( > VALUES > ('<?xml version="1.0" encoding="ISO-8859-1"?> > <!DOCTYPE doc SYSTEM "doc.dtd" [ > <!ENTITY val "42"> > <!ATTLIST xyz attr CDATA "default"> > ]> > > <!-- ordering of attributes --> > <foo ns:c = "3" ns:b = "2" ns:a = "1" > xmlns:ns="http://postgresql.org"> > > <!-- Normalization of whitespace in start and end tags --> > <!-- Elimination of superfluous namespace declarations, > as already declared in <foo> --> > <bar xmlns:ns="http://postgresql.org" >&val;</bar > > > <!-- Empty element conversion to start-end tag pair --> > <empty/> > > <!-- Effect of transcoding from a sample encoding to UTF-8 --> > <iso8859>©</iso8859> > > <!-- Addition of default attribute --> > <!-- Whitespace inside tag preserved --> > <xyz> 321 </xyz> > </foo> > <!-- comment outside doc -->'::xml) > ) > SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t; > xmlserialize > -------------------------------------------------------------------------------------------------------------------------------------------------------- > > <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2" > ns:c="3"><bar>42</bar><empty></empty><iso8859>©</iso8859><xyz > attr="default"> 321 </xyz></foo> > (1 row) > > -- using WITH COMMENTS > > WITH t(col) AS ( > VALUES > (' <foo ns:c = "3" ns:b = "2" ns:a = "1" > xmlns:ns="http://postgresql.org"> > <!-- very important comment --> > <xyz> 321 </xyz> > </foo>'::xml) > ) > SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t; > xmlserialize > ------------------------------------------------------------------------------------------------------------------------ > > <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2" ns:c="3"><!-- > very important comment --><xyz> 321 </xyz></foo> > (1 row) > > > Another option would be to simply create a new function, e.g. > xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this > would be the right approach. > > Attached a very short draft. What do you think? > > Best, Jim > > 1- https://www.w3.org/TR/xml-c14n11/ The attached version includes documentation and tests to the patch. I hope things are clearer now :) Best, Jim
Вложения
В списке pgsql-hackers по дате отправления: