Обсуждение: Add GUC to enable libxml2's XML_PARSE_HUGE

Поиск
Список
Период
Сортировка

Add GUC to enable libxml2's XML_PARSE_HUGE

От
Jim Jones
Дата:
Hi,

In commit 71c0921 we re-introduced use of xmlParseBalancedChunkMemory in
order to allow parsing of large XML documents with certain libxml2
versions [1]. While that solved a regression issue, it still leaves the
handling of very large or deeply nested XML documents tied to libxml2’s
internal limits and behaviuor.

To address this, Erik and I would like to propose a new GUC,
xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option. This
makes the handling of large XML documents explicit and independent of
libxml2 version quirks. The new predefined role pg_xml_parse_huge allows
superusers to grant session-level use of this option without granting
full superuser rights, so DBAs can flexibly delegate the capability in a
controlled manner.

Examples:

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE USER u1;
CREATE ROLE
postgres=# CREATE DATABASE db OWNER u1;
CREATE DATABASE
postgres=# \q



# By default a user cannot set this parameter and the default value is 'off'

$ /usr/local/postgres-dev/bin/psql -d db -U u1
psql (19devel)
Type "help" for help.

db=> SHOW xml_parse_huge;
 xml_parse_huge
----------------
 off
(1 row)

db=> SET xml_parse_huge TO on;
ERROR:  permission denied to set parameter "xml_parse_huge"
HINT:  You must be a superuser or a member of the "pg_xml_parse_huge"
role to set this option.

db=> ALTER SYSTEM SET xml_parse_huge TO on;
ERROR:  permission denied to set parameter "xml_parse_huge"


# This leads libxml2 to raise an error for text nodes exceeding
XML_MAX_TEXT_LENGTH

db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) ||
'</root>')::xml;
ERROR:  invalid XML content
DETAIL:  line 1: Resource limit exceeded: Text node too long, try
XML_PARSE_HUGE
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


# The role pg_xml_parse_huge allows the user to set the new parameter

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# GRANT pg_xml_parse_huge TO u1;
GRANT ROLE
postgres=# \q

$ /usr/local/postgres-dev/bin/psql -d db -U u1
psql (19devel)
Type "help" for help.

db=> SET xml_parse_huge TO on;
SET
db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) ||
'</root>')::xml;
SELECT 1


# It is also possible to enable this feature by default for a user

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE USER u2;
CREATE ROLE
postgres=# GRANT pg_xml_parse_huge TO u2;
GRANT ROLE
postgres=# ALTER USER u2 SET xml_parse_huge TO on;
ALTER ROLE
postgres=# \q

$ /usr/local/postgres-dev/bin/psql -d db -U u2
psql (19devel)
Type "help" for help.

db=> SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 on
(1 row)


# A superuser can enable this feature for a whole database (or the whole
cluster via postgresql.conf):

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE DATABASE db2;
CREATE DATABASE
postgres=# ALTER DATABASE db2 SET xml_parse_huge TO on;
ALTER DATABASE
postgres=# SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 off
(1 row)

postgres=# \c db2
You are now connected to database "db2" as user "jim".
db2=# SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 on
(1 row)


Attached is a first draft.

* I'm CC'ing Tom and Michael since they were involved in the earlier
discussion.

Initially we considered creating a second GUC instead of a role, but
decided that would be confusing and less manageable than having a single
GUC with role-based delegation.

Any thoughts or comments?

[1]

https://www.postgresql.org/message-id/flat/a8771e75-60ee-4c99-ae10-ca4832e1ec8d%40uni-muenster.de#1cfece11b1d62fbd43ed644e1f9710e2

Best regards, Jim
Вложения

Re: Add GUC to enable libxml2's XML_PARSE_HUGE

От
Tom Lane
Дата:
Jim Jones <jim.jones@uni-muenster.de> writes:
> To address this, Erik and I would like to propose a new GUC,
> xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option.

Given the spotty security history of libxml2, I can't really see
how this wouldn't be enormously unsafe.  Even as a superuser-only
option, it seems like a bad idea.

Independently of that, we have learned the hard way that GUCs
that change application-visible query semantics are a bad idea.
You cannot really argue that this wouldn't be one.

            regards, tom lane



Re: Add GUC to enable libxml2's XML_PARSE_HUGE

От
Jim Jones
Дата:
Hi Tom

Thanks for replying so quickly!

On 20.08.25 17:46, Tom Lane wrote:
> Given the spotty security history of libxml2, I can't really see
> how this wouldn't be enormously unsafe.  Even as a superuser-only
> option, it seems like a bad idea.

I was under the impression that the status quo already exposes
PostgreSQL to these risks silently, depending on the libxml2
(xmlParseBalancedChunkMemory) version in use. Our rationale was that
making the feature explicit would at least render it visible, auditable,
and delegable by superusers. Of course, the documentation should clearly
warn about the risks.

> Independently of that, we have learned the hard way that GUCs
> that change application-visible query semantics are a bad idea.
> You cannot really argue that this wouldn't be one.

I share these concerns about changing query semantics through GUCs, but
I thought this case might not be so different from xmloption:

postgres=# SET xmloption TO document;
SET
postgres=# SELECT 'hello'::xml;
ERROR:  invalid XML document
LINE 1: SELECT 'hello'::xml;
               ^
DETAIL:  line 1: Start tag expected, '<' not found
hello
^
postgres=# SET xmloption TO content;
SET
postgres=# SELECT 'hello'::xml;
  xml  
-------
 hello
(1 row)

Would you see any other way, other than a GUC, to provide this feature?

Thanks!

Best, Jim



Re: Add GUC to enable libxml2's XML_PARSE_HUGE

От
Michael Paquier
Дата:
On Wed, Aug 20, 2025 at 11:46:11AM -0400, Tom Lane wrote:
> Independently of that, we have learned the hard way that GUCs
> that change application-visible query semantics are a bad idea.
> You cannot really argue that this wouldn't be one.

For reference: a famous example of that is autocommit as far as I
recall, with commit f85f43dfb5b9.
--
Michael

Вложения

Re: Add GUC to enable libxml2's XML_PARSE_HUGE

От
Erik Wienhold
Дата:
On 2025-08-20 21:42 +0200, Jim Jones wrote:
> On 20.08.25 17:46, Tom Lane wrote:
> > Independently of that, we have learned the hard way that GUCs that
> > change application-visible query semantics are a bad idea.  You
> > cannot really argue that this wouldn't be one.

I totally forgot about that stance.  Is this only about adding new GUCs?
Because there are existing GUCs that change semantics, e.g. xmlbinary,
check_function_bodies.  I guess there's a trade-off between usefulness
and being error-prone/surprising to the user (POLA).

> I share these concerns about changing query semantics through GUCs,
> but I thought this case might not be so different from xmloption:
> 
> postgres=# SET xmloption TO document;
> SET
> postgres=# SELECT 'hello'::xml;
> ERROR:  invalid XML document
> LINE 1: SELECT 'hello'::xml;
>                ^
> DETAIL:  line 1: Start tag expected, '<' not found
> hello
> ^
> postgres=# SET xmloption TO content;
> SET
> postgres=# SELECT 'hello'::xml;
>   xml  
> -------
>  hello
> (1 row)

I guess the excuse for xmloption is that the SQL standard defines SET
XML OPTION.

> Would you see any other way, other than a GUC, to provide this
> feature?

Forking off the core XML code into an extension to provide a custom xml
data type with the desired parsing behavior? :(

-- 
Erik Wienhold



Re: Add GUC to enable libxml2's XML_PARSE_HUGE

От
Jim Jones
Дата:

On 21.08.25 04:26, Erik Wienhold wrote:
> I guess the excuse for xmloption is that the SQL standard defines SET
> XML OPTION.

I guess you're right... in this case the standard dictates what should
be done. It just doesn't change the fact that depending on the parameter
value the same query succeeds or fails, which is pretty much what would
happen with xml_parse_huge.

>> Would you see any other way, other than a GUC, to provide this
>> feature?
> Forking off the core XML code into an extension to provide a custom xml
> data type with the desired parsing behavior? :(

That could work. I'm just afraid it would be too much trouble just to
enable text nodes larger than 10MB.

Regards, Jim