Обсуждение: XML element with special characters can be created, serialized, but not deserialized

Поиск
Список
Период
Сортировка

XML element with special characters can be created, serialized, but not deserialized

От
Sergiu Ignat
Дата:

Hello,

I am using PostgreSQL 13.8 and I think that I found an issue with XML serialization and deserialization.

A text that has special characters cannot be converted to XML even if it was created by serializing an XML element.

In our case a string contains a special character with the ASCII code 19, placed between the letters i and p.
The simple statement that serializes an XML element works.
select xmlelement(name "street",'i p')::text

When the same text has to be converted back to XML. it fails with an error

select xmlelement(name "street",'i p')::text::xml


The error message is

SQL Error [2200N]: ERROR: invalid XML content
  Detail: line 1: PCDATA invalid Char value 19
<street>i p</street>
         ^
line 1: chunk is not well balanced
<street>i p</street>
                    ^

The expected behaviour would be to successfully parse an XML element that was created and serialized by the same engine. 

Best regards,
--
Serghei Ignat
Sergiu Ignat <sergiu@bitsoftware.ro> writes:
> I am using PostgreSQL 13.8 and I think that I found an issue with XML
> serialization and deserialization.

Hmm.  The root cause here seems to be that escape_xml() thinks it
doesn't need to escape ASCII control characters, other than CR (\r).
Which is a bit backwards, because after some googling I conclude that
XML 1.1 requires all C0 and C1 control characters to be represented as
numeric escapes *except* CR, LF, and TAB [1].

What we probably ought to do is escape all except LF and TAB.
However, I'm a bit hesitant to back-patch such a behavioral change.
Maybe change this in HEAD (v16) only?

            regards, tom lane

[1] https://www.w3.org/International/questions/qa-controls