Обсуждение: [PATCH] Introduce unified support for composite GUC options
Hello hackers,
This patch adds a unified mechanism for declaring and using composite configuration options in GUC, eliminating the need to write a custom parser for each new complex data type. New syntax for end user is json-like.
Currently, adding a new composite configuration option requires a significant amount of boilerplate code:
- For DBAs: Learning a new syntax for each composite option.
- For developers: Implementing a new parser from scratch for each composite type in GUC.
This patch solves these problems by providing a declarative system for defining composite types and their structure.
Major changes:
- guc_tables.h: Added new type config_composite for all composite configuration options.
- guc_composite.c: This file contains all functions related to composite options: calculating alignments, defining field types, working with memory, serialization. The functions from here are used in guc.c and guc_composite_gram.y
- guc.c: New code in this file describes the behavior of the system in the case of PGC_COMPOSITE
- guc_composite_scan.l: guc_composite_gram.y is a lexer and parser for values of composite data types.
Usage features:
Mapping between UI representation and internal variables works due to the signature that the programmer declares for the composite type. For core options, the declaration data is in the UserDefinedConfigureTypes array. For extensions composite types are declared using the DefineCustomCompositeType function.
All declarations must be arranged topologically. That is, if the type A option contains a type B field, then type B must be declared first, and only after that type A.
The main fields in the type definition are the type name and its signature.
The type signature has the following syntax:
“field_type field_name; field_type field_name; ...; field_type field_name”
where field_type is the already registered type, field_name is the field name.
There are also data types that do not need to be declared - these are arrays. So, if there is a registered type A, then the following data types automatically become available: A[n] is a static array of length n and A[] is a dynamic array.
Note that the declared type signature must exactly match the signature of the structure in the C code, since it will then be used to calculate the alignment of fields according to the rules of the C language.
Dynamic arrays are always mapped into a structure like:
struct DynArr {
void *data; //pointer to data
int size; //length of the array
}
After declaring the type definition, you can declare a composite type configuration option. The core options are declared in the guc_parameters.dat file. They must specify the type => ‘composite’ fields and specify in the type_name field the name of the composite type that was declared earlier. In the boot_val field, write a pointer to a global variable that will store this value. Options from extensions are declared using DefineCustomCompositeVariable.
Now you can use the following syntax to work with the new options both in the configuration file and in psql:
Access field of the struct: option_name->field_name
Access to an array element: option_name[index]
You can combine these access methods.
Dynamic arrays always have implicit fields data and size. data is the data of the array, size is its length.
Values of composite types have the following syntax:
Structures: {field: value, ..., field: value}
Static arrays: [index: value, index: value]
As mentioned earlier, dynamic arrays have implicit fields, so you can use 2 syntaxes to set values.:
compact (same as for static arrays) and extended:
{data: [index: value, .., index: value], size: value}.
It is not necessary to write indexes in array values. If you write without indexes, it is assumed that indexing starts from 0 with an increment of 1. In this case, all elements within the same array must be either with or without indexes.
When using the show command, the display of the dynamic array depends on the extended_guc_arrays option. If this flag is true, then the extended form is used, otherwise the compact form is used.
String values within composite types also support escape sequences.
All the functionality available to scalar options is also supported, such as: …
The system uses incremental semantics. This means that when writing to a .conf file or the set command, only the specified fields of the structure will be changed, the remaining fields will not be involved. This semantics also applies to the ALTER SYSTEM. When using ALTER SYSTEM, the current value will be written to the .auto.conf file with the changed fields that were described when calling the command, while the current value of the option will not change.
The patch applies cleanly to the master (454c046094ab3431c2ce0c540c46e623bc05bd1a).
In the additional patch (guc_composite_types_tests.patch), I added several composite options so that the new functionality could be tested using their example. Regression and TAP tests were written for them in the same patch.
I would appreciate any feedback and review.
Best regards,
Anton Chumak
Вложения
Hello hackers,
The new version of the patch adds support for multi-line writing of composite type values in the postgresql.conf file. Hidden fields have also been added. Such fields may be required to protect the private part of the state of a composite option from an external user. In order for the field to be hidden, the composite type signature must describe only the field type without the field name.
Please note that all allocated resources used within hidden fields should use only guc_malloc. This is necessary to automatically release resources.
The patch applies cleanly to the master (9fc7f6ab7226d7c9dbe4ff333130c82f92749f69)
Best regards,
Anton Chumak
Вложения
Hello hackers,
The new version of the patch adds support for multi-line writing of composite type values in the postgresql.conf file. Hidden fields have also been added. Such fields may be required to protect the private part of the state of a composite option from an external user. In order for the field to be hidden, the composite type signature must describe only the field type without the field name.
Please note that all allocated resources used within hidden fields should use only guc_malloc. This is necessary to automatically release resources.
The patch applies cleanly to the master (9fc7f6ab7226d7c9dbe4ff333130c82f92749f69)
Best regards,
Anton Chumak
=?utf-8?q?=D0=A7=D1=83=D0=BC=D0=B0=D0=BA_=D0=90=D0=BD=D1=82=D0=BE=D0=BD?= <a.chumak@postgrespro.ru> writes: > This patch adds a unified mechanism for declaring and using composite configuration options in GUC, eliminating the needto write a custom parser for each new complex data type. New syntax for end user is json-like. TBH, I think this is a bad idea altogether. GUCs that would need this are probably poorly designed in the first place; we should not encourage inventing more. I also don't love adding thousands of lines of code without any use-case at hand. regards, tom lane
Чум=D0�к Ан�= 82он <a.chumak@postgrespro.ru> writes:
> This patch adds a unified mechanism for declaring and using composite configuration options in GUC, eliminating the need to write a custom parser for each new complex data type. New syntax for end user is json-like.
TBH, I think this is a bad idea altogether. GUCs that would need
this are probably poorly designed in the first place; we should not
encourage inventing more. I also don't love adding thousands of
lines of code without any use-case at hand.
Sorry, I replied to the email without the hackers tag, so some of our correspondence was not saved on hackers. Therefore, I will quote my answer and Pavel's questions and remarks below.
>>Thank you for your question!
>>Composite parameters in a configuration system are needed to describe complex objects that have many interrelated parameters. Such examples already exist in PostgreSQL: synchronous_standby_names or primary_conninfo. And with these parameters, there are some difficulties for both developers and DBMS administrators.
>Do we really need this?
>synchronous_standby_names is a simple list and primary_conninfo is just a string - consistent with any other postgresql connection string.
synchronous_standby_names is somewhat more complicated than a regular list. Its first field is the mode, the second is the number of required replicas, and only then is the list. Note its check hook. A parser is called there, whose code length exceeds the rest of the logic associated with this parameter. This is exactly the kind of problem the patch solves.
>If you need to store more complex values, why you don't use integrated json parser?
>
>I don't like you introduce new independent language just for GUC and this is not really short (and it is partially redundant to json). Currently working with GUC is simple, because supported operations and formats are simple.
I looked at the json value parsing function with the ability to use custom semantic actions, and it might be a really great idea to use it instead of a self-written parser. Then the composite values will have the standard json syntax, and the patch will probably decrease in size.
>>For administrators:
>> 1. The value of such parameters can only be written in full as a string and there is no way to access individual fields or substructure.
>> 2. Each such parameter has its own syntax (compare the syntax description of synchronous_standby_names and primary_conninfo)
>>For developers:
>>1. For each composite parameter, you need to write your own parser that will parse the string value, instead of just describing the logic.
>>Personally, I needed to describe the cluster configuration. A cluster consists of nodes interconnected by some logic. And it turns out that in the current system, I need to write 1 more parser for this parameter, and the user will have to learn 1 more syntax.
>>This patch creates a unified approach to creating composite options, provides a unified syntax for values of composite types, adds the ability to work with fields and substructures, and eliminates the need for developers to write their own parsers for each composite parameter
>looks like overengineering for me - when you have complex configuration - isn't better to use table? Or json value - if you need to store all to one GUC.
Tables are not suitable for storing configuration, because we need GUC capabilities such as analyzing the source of a new value, working at the time of postmaster startup, SET LOCAL support, etc.
>Another issue is using symbols -> for dereferencing directly from the scanner. It can break applications that use the same symbols as a custom operator.
I made the dereference operator look like -> because the dot is already used to separate the class of names from options. It is possible to use a dot, but then we need to agree that composite parameters and extensions must not have the same names in order to avoid collisions.
Best regards
Anton Chumak
On Monday, September 22, 2025, Tom Lane <tgl@sss.pgh.pa.us> wrote:Чумак Антон <a.chumak@postgrespro.ru> writes:
> This patch adds a unified mechanism for declaring and using composite configuration options in GUC, eliminating the need to write a custom parser for each new complex data type. New syntax for end user is json-like.
TBH, I think this is a bad idea altogether. GUCs that would need
this are probably poorly designed in the first place; we should not
encourage inventing more. I also don't love adding thousands of
lines of code without any use-case at hand.Yeah, there is a decent height bar for me too. The main functional benefit we’d get is that since both (multiple) settings are being given values simultaneously the check option code can enforce that only valid combinations are ever specified instead of generally needing runtime checks.Beyond that, just use separate options with a naming scheme.I can maybe see this for session variables masquerading as GUCs since we lack the former. Something like wanting to store a JWT as-is in a GUC then referencing its components.
David J.
Sorry, I replied to the email without the hackers tag, so some of our correspondence was not saved on hackers. Therefore, I will quote my answer and Pavel's questions and remarks below.
>>Thank you for your question!
>>Composite parameters in a configuration system are needed to describe complex objects that have many interrelated parameters. Such examples already exist in PostgreSQL: synchronous_standby_names or primary_conninfo. And with these parameters, there are some difficulties for both developers and DBMS administrators.
>Do we really need this?
>synchronous_standby_names is a simple list and primary_conninfo is just a string - consistent with any other postgresql connection string.
synchronous_standby_names is somewhat more complicated than a regular list. Its first field is the mode, the second is the number of required replicas, and only then is the list. Note its check hook. A parser is called there, whose code length exceeds the rest of the logic associated with this parameter. This is exactly the kind of problem the patch solves.
>If you need to store more complex values, why you don't use integrated json parser?
>
>I don't like you introduce new independent language just for GUC and this is not really short (and it is partially redundant to json). Currently working with GUC is simple, because supported operations and formats are simple.
I looked at the json value parsing function with the ability to use custom semantic actions, and it might be a really great idea to use it instead of a self-written parser. Then the composite values will have the standard json syntax, and the patch will probably decrease in size.
>>For administrators:
>> 1. The value of such parameters can only be written in full as a string and there is no way to access individual fields or substructure.
>> 2. Each such parameter has its own syntax (compare the syntax description of synchronous_standby_names and primary_conninfo)
>>For developers:
>>1. For each composite parameter, you need to write your own parser that will parse the string value, instead of just describing the logic.
>>Personally, I needed to describe the cluster configuration. A cluster consists of nodes interconnected by some logic. And it turns out that in the current system, I need to write 1 more parser for this parameter, and the user will have to learn 1 more syntax.
>>This patch creates a unified approach to creating composite options, provides a unified syntax for values of composite types, adds the ability to work with fields and substructures, and eliminates the need for developers to write their own parsers for each composite parameter
>looks like overengineering for me - when you have complex configuration - isn't better to use table? Or json value - if you need to store all to one GUC.
Tables are not suitable for storing configuration, because we need GUC capabilities such as analyzing the source of a new value, working at the time of postmaster startup, SET LOCAL support, etc.
>Another issue is using symbols -> for dereferencing directly from the scanner. It can break applications that use the same symbols as a custom operator.
I made the dereference operator look like -> because the dot is already used to separate the class of names from options. It is possible to use a dot, but then we need to agree that composite parameters and extensions must not have the same names in order to avoid collisions.
Best regards
Anton Chumak
Pavel Stehule <pavel.stehule@gmail.com> writes: > Using GUC as session variables is a workaround because there is nothing > better. But it is not good solution Agreed, but we don't yet have a better one ... > The basic question is if variables should be typed or typeless - like > plpgsql or psql variables. I think it is absolutely critical that GUCs *not* depend on the SQL type system in any way. That would be a fundamental layering violation, because we need to be able to read postgresql.conf before we can read catalogs --- not to mention that relevant type definitions might be different in different databases. I'm not sure that this point means much to the feature proposed in this thread, since IIUC it's proposing "use JSON no matter what". But it is a big problem for trying to use GUCs as session variables with non-built-in types. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Using GUC as session variables is a workaround because there is nothing
> better. But it is not good solution
Agreed, but we don't yet have a better one ...
> The basic question is if variables should be typed or typeless - like
> plpgsql or psql variables.
I think it is absolutely critical that GUCs *not* depend on the
SQL type system in any way. That would be a fundamental layering
violation, because we need to be able to read postgresql.conf
before we can read catalogs --- not to mention that relevant type
definitions might be different in different databases.
I'm not sure that this point means much to the feature proposed in
this thread, since IIUC it's proposing "use JSON no matter what".
But it is a big problem for trying to use GUCs as session variables
with non-built-in types.
regards, tom lane
>when you use json, then what is the benefit from your patch?
json is just a syntax. This is only part of the patch. The main feature is that we can directly, in a standard way, without the efforts of developers, translate composite values from user interfaces like psql or postgresql.conf into structures in C code. With this patch, the configuration system gains the ability to correctly manage the state of composite objects. This is important when you need to change 2 out of 5 fields at the same time so that the structure remains consistent. In addition, the new configuration module takes over the management of resources within the framework, which can be important for strings and dynamic arrays. There are other auxiliary features like hidden fields.
>It is not too big difference if I set value by SET command or by SELECT set_config()
Working with parameters is not limited to working within a session, otherwise the PGC_INTERNAL, PGC_POSTMASTER, and PGC_SIGHUP contexts would not be needed. My patch provides unified support for composite types and within such contexts. Example: you have a composite boot value and in the postgresql.conf file you need to change only 2 fields, and you need to do this at the same time to maintain the consistency of the structure. Now you would have to describe all the fields in one big line, and with the patch you can only describe the changed fields.
Best regards
Anton Chumak
>when you use json, then what is the benefit from your patch?
json is just a syntax. This is only part of the patch. The main feature is that we can directly, in a standard way, without the efforts of developers, translate composite values from user interfaces like psql or postgresql.conf into structures in C code. With this patch, the configuration system gains the ability to correctly manage the state of composite objects. This is important when you need to change 2 out of 5 fields at the same time so that the structure remains consistent. In addition, the new configuration module takes over the management of resources within the framework, which can be important for strings and dynamic arrays. There are other auxiliary features like hidden fields.
>It is not too big difference if I set value by SET command or by SELECT set_config()Working with parameters is not limited to working within a session, otherwise the PGC_INTERNAL, PGC_POSTMASTER, and PGC_SIGHUP contexts would not be needed. My patch provides unified support for composite types and within such contexts. Example: you have a composite boot value and in the postgresql.conf file you need to change only 2 fields, and you need to do this at the same time to maintain the consistency of the structure. Now you would have to describe all the fields in one big line, and with the patch you can only describe the changed fields.
Best regardsAnton Chumak
Working with parameters is not limited to working within a session, otherwise the PGC_INTERNAL, PGC_POSTMASTER, and PGC_SIGHUP contexts would not be needed. My patch provides unified support for composite types and within such contexts. Example: you have a composite boot value and in the postgresql.conf file you need to change only 2 fields, and you need to do this at the same time to maintain the consistency of the structure. Now you would have to describe all the fields in one big line, and with the patch you can only describe the changed fields.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > As you note - moving runtime checks to "SET" time has value and this patch > brings that value. But it is not evident there is enough value to take on > the added complexity. There are few to no requests asking for this ability. If anything, I'd say we have decades of experience showing that early checking of GUC values creates more problems than it solves. There are too many cases where necessary context is not available at the time of setting the value. Particularly, CREATE FUNCTION ... SET and ALTER DATABASE/USER ... SET are problematic for this. regards, tom lane