Обсуждение: Add GUC to enable libxml2's XML_PARSE_HUGE
Hi, In commit 71c0921 we re-introduced use of xmlParseBalancedChunkMemory in order to allow parsing of large XML documents with certain libxml2 versions [1]. While that solved a regression issue, it still leaves the handling of very large or deeply nested XML documents tied to libxml2’s internal limits and behaviuor. To address this, Erik and I would like to propose a new GUC, xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option. This makes the handling of large XML documents explicit and independent of libxml2 version quirks. The new predefined role pg_xml_parse_huge allows superusers to grant session-level use of this option without granting full superuser rights, so DBAs can flexibly delegate the capability in a controlled manner. Examples: $ /usr/local/postgres-dev/bin/psql postgres psql (19devel) Type "help" for help. postgres=# CREATE USER u1; CREATE ROLE postgres=# CREATE DATABASE db OWNER u1; CREATE DATABASE postgres=# \q # By default a user cannot set this parameter and the default value is 'off' $ /usr/local/postgres-dev/bin/psql -d db -U u1 psql (19devel) Type "help" for help. db=> SHOW xml_parse_huge; xml_parse_huge ---------------- off (1 row) db=> SET xml_parse_huge TO on; ERROR: permission denied to set parameter "xml_parse_huge" HINT: You must be a superuser or a member of the "pg_xml_parse_huge" role to set this option. db=> ALTER SYSTEM SET xml_parse_huge TO on; ERROR: permission denied to set parameter "xml_parse_huge" # This leads libxml2 to raise an error for text nodes exceeding XML_MAX_TEXT_LENGTH db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) || '</root>')::xml; ERROR: invalid XML content DETAIL: line 1: Resource limit exceeded: Text node too long, try XML_PARSE_HUGE XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # The role pg_xml_parse_huge allows the user to set the new parameter $ /usr/local/postgres-dev/bin/psql postgres psql (19devel) Type "help" for help. postgres=# GRANT pg_xml_parse_huge TO u1; GRANT ROLE postgres=# \q $ /usr/local/postgres-dev/bin/psql -d db -U u1 psql (19devel) Type "help" for help. db=> SET xml_parse_huge TO on; SET db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) || '</root>')::xml; SELECT 1 # It is also possible to enable this feature by default for a user $ /usr/local/postgres-dev/bin/psql postgres psql (19devel) Type "help" for help. postgres=# CREATE USER u2; CREATE ROLE postgres=# GRANT pg_xml_parse_huge TO u2; GRANT ROLE postgres=# ALTER USER u2 SET xml_parse_huge TO on; ALTER ROLE postgres=# \q $ /usr/local/postgres-dev/bin/psql -d db -U u2 psql (19devel) Type "help" for help. db=> SHOW xml_parse_huge ; xml_parse_huge ---------------- on (1 row) # A superuser can enable this feature for a whole database (or the whole cluster via postgresql.conf): $ /usr/local/postgres-dev/bin/psql postgres psql (19devel) Type "help" for help. postgres=# CREATE DATABASE db2; CREATE DATABASE postgres=# ALTER DATABASE db2 SET xml_parse_huge TO on; ALTER DATABASE postgres=# SHOW xml_parse_huge ; xml_parse_huge ---------------- off (1 row) postgres=# \c db2 You are now connected to database "db2" as user "jim". db2=# SHOW xml_parse_huge ; xml_parse_huge ---------------- on (1 row) Attached is a first draft. * I'm CC'ing Tom and Michael since they were involved in the earlier discussion. Initially we considered creating a second GUC instead of a role, but decided that would be confusing and less manageable than having a single GUC with role-based delegation. Any thoughts or comments? [1] https://www.postgresql.org/message-id/flat/a8771e75-60ee-4c99-ae10-ca4832e1ec8d%40uni-muenster.de#1cfece11b1d62fbd43ed644e1f9710e2 Best regards, Jim
Вложения
Jim Jones <jim.jones@uni-muenster.de> writes: > To address this, Erik and I would like to propose a new GUC, > xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option. Given the spotty security history of libxml2, I can't really see how this wouldn't be enormously unsafe. Even as a superuser-only option, it seems like a bad idea. Independently of that, we have learned the hard way that GUCs that change application-visible query semantics are a bad idea. You cannot really argue that this wouldn't be one. regards, tom lane
Hi Tom Thanks for replying so quickly! On 20.08.25 17:46, Tom Lane wrote: > Given the spotty security history of libxml2, I can't really see > how this wouldn't be enormously unsafe. Even as a superuser-only > option, it seems like a bad idea. I was under the impression that the status quo already exposes PostgreSQL to these risks silently, depending on the libxml2 (xmlParseBalancedChunkMemory) version in use. Our rationale was that making the feature explicit would at least render it visible, auditable, and delegable by superusers. Of course, the documentation should clearly warn about the risks. > Independently of that, we have learned the hard way that GUCs > that change application-visible query semantics are a bad idea. > You cannot really argue that this wouldn't be one. I share these concerns about changing query semantics through GUCs, but I thought this case might not be so different from xmloption: postgres=# SET xmloption TO document; SET postgres=# SELECT 'hello'::xml; ERROR: invalid XML document LINE 1: SELECT 'hello'::xml; ^ DETAIL: line 1: Start tag expected, '<' not found hello ^ postgres=# SET xmloption TO content; SET postgres=# SELECT 'hello'::xml; xml ------- hello (1 row) Would you see any other way, other than a GUC, to provide this feature? Thanks! Best, Jim
On Wed, Aug 20, 2025 at 11:46:11AM -0400, Tom Lane wrote: > Independently of that, we have learned the hard way that GUCs > that change application-visible query semantics are a bad idea. > You cannot really argue that this wouldn't be one. For reference: a famous example of that is autocommit as far as I recall, with commit f85f43dfb5b9. -- Michael
Вложения
On 2025-08-20 21:42 +0200, Jim Jones wrote: > On 20.08.25 17:46, Tom Lane wrote: > > Independently of that, we have learned the hard way that GUCs that > > change application-visible query semantics are a bad idea. You > > cannot really argue that this wouldn't be one. I totally forgot about that stance. Is this only about adding new GUCs? Because there are existing GUCs that change semantics, e.g. xmlbinary, check_function_bodies. I guess there's a trade-off between usefulness and being error-prone/surprising to the user (POLA). > I share these concerns about changing query semantics through GUCs, > but I thought this case might not be so different from xmloption: > > postgres=# SET xmloption TO document; > SET > postgres=# SELECT 'hello'::xml; > ERROR: invalid XML document > LINE 1: SELECT 'hello'::xml; > ^ > DETAIL: line 1: Start tag expected, '<' not found > hello > ^ > postgres=# SET xmloption TO content; > SET > postgres=# SELECT 'hello'::xml; > xml > ------- > hello > (1 row) I guess the excuse for xmloption is that the SQL standard defines SET XML OPTION. > Would you see any other way, other than a GUC, to provide this > feature? Forking off the core XML code into an extension to provide a custom xml data type with the desired parsing behavior? :( -- Erik Wienhold
On 21.08.25 04:26, Erik Wienhold wrote: > I guess the excuse for xmloption is that the SQL standard defines SET > XML OPTION. I guess you're right... in this case the standard dictates what should be done. It just doesn't change the fact that depending on the parameter value the same query succeeds or fails, which is pretty much what would happen with xml_parse_huge. >> Would you see any other way, other than a GUC, to provide this >> feature? > Forking off the core XML code into an extension to provide a custom xml > data type with the desired parsing behavior? :( That could work. I'm just afraid it would be too much trouble just to enable text nodes larger than 10MB. Regards, Jim