Обсуждение: Generate GUC tables from .dat file
My idea is to store the information in guc_tables.c in a .dat file similar to the catalog data in src/include/catalog/, and generated guc_tables.c from that. I want to make it easier to edit that information, and I want to be able to make changes to the downstream data structures more easily. (Essentially, those are the same reasons as for the original adoption of the .dat format.) An important project is to adapt the GUC system to a multithreaded server. The leading idea is to convert most global variables to thread-local storage. But this doesn't work with the current global structs, because they can't contain a pointer to a thread-local variable. Some workarounds and changes exist in WIP threading branches, but they all require at least some mechanical changes to the global tables. If these tables could be generated automatically, it would be easier to try out different things. More generally, maybe the current format of a global struct that points to many global variables is not the right one anymore. Maybe the global variables should be packaged into a struct, or several structs. Or maybe the whole thing could just be a hash table and you retrieve values when you need them. Who knows. But this would make it easier to experiment and make changes. Another possible benefit would be that we could generate the postgresql.conf.sample file. This is also very tedious to edit and maintain, and sometimes people want to make larger changes, and this is very difficult. And then we might not need tests like 003_check_guc.pl that check the consistency of the sample file. So here is an initial POC patch. I have written a script to convert a new src/backend/utils/misc/guc_parameters.dat to what would be guc_tables.c, but in the patch it's guc_tables_new.c. The guc_parameters.dat in the patch is populated only with a few entries that cover most of the different types and variants and possible settings, so we can see what it would look like. Eventually, this would require a big conversion. My next goal would be to make this work so that most of guc_tables.c is generated, and nothing else changes beyond that. But before I do all the remaining tedious work, I wanted to check what people think.
Вложения
On Aug 11, 2025, at 02:04, Peter Eisentraut <peter@eisentraut.org> wrote: > My next goal would be to make this work so that most of guc_tables.c is generated, and nothing else changes beyond that. But before I do all the remaining tedious work, I wanted to check what people think. Honestly it seems like a no-brainer to me. Looking askance at the Perl style, though 😂 Best, David
Вложения
On 8/11/25 12:46, David E. Wheeler wrote: > On Aug 11, 2025, at 02:04, Peter Eisentraut <peter@eisentraut.org> wrote: >> My next goal would be to make this work so that most of >> guc_tables.c is generated, and nothing else changes beyond that. >> But before I do all the remaining tedious work, I wanted to check >> what people think. > Honestly it seems like a no-brainer to me. +1 Without having looked closely at the details, conceptually this sounds like a great idea to me. -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com
On 11.08.25 08:04, Peter Eisentraut wrote: > So here is an initial POC patch. I have written a script to convert a > new src/backend/utils/misc/guc_parameters.dat to what would be > guc_tables.c, but in the patch it's guc_tables_new.c. The > guc_parameters.dat in the patch is populated only with a few entries > that cover most of the different types and variants and possible > settings, so we can see what it would look like. Eventually, this would > require a big conversion. > > My next goal would be to make this work so that most of guc_tables.c is > generated, and nothing else changes beyond that. Ok, I did the big conversion, and tidied everything up so that it now generates the big tables in guc_tables.c from the .dat file. This basically works now. Some notes: - I included a preparatory patch to clean up some formatting in guc_tables.c so that it's easier to visually compare the generated code. - Similarly, for this time, I left the order of the entries in guc_parameters.dat to be the same as it was in guc_tables.c, so that it's easier to compare. Eventually I would like to re-order those, probably alphabetically, but that can be a separate follow-up patch. - In this patch, I left the code to be replaced in guc_tables.c, so that it's easier to keep this patch rebased. But you should imagine that the parts under NOT_USED will get deleted. - One thing I didn't reproduce in the generated code is the line breaks in the description strings. But that should be ok either way. - In a few places there were complicated #ifdef's to determine the default value. I extracted that out into separate symbols. - Eventually it would be interesting to try to generate the other parts of guc_tables.c, like the enums and the categories, but I'm leaving that for another time. - I moved the C code comments into the .dat file as Perl comments. I figure that makes sense as that's where people would be editing now. I feel that if people like this format, this is functional and useful as a first step.
Вложения
On Wed, Aug 20, 2025 at 3:18 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Ok, I did the big conversion, and tidied everything up so that it now > generates the big tables in guc_tables.c from the .dat file. This > basically works now. > - One thing I didn't reproduce in the generated code is the line breaks > in the description strings. But that should be ok either way. It's certainly not worse than, say, some proargnames in pg_proc.dat. > - In a few places there were complicated #ifdef's to determine the > default value. I extracted that out into separate symbols. Seems fine this way. > - I moved the C code comments into the .dat file as Perl comments. I > figure that makes sense as that's where people would be editing now. That makes sense. I noticed some comments were not carried over. In a mechanical change like this, I'd expect all comments to be preserved in some form, or at least removed in the preparatory step so that the change is more visible to others. It'd probably take a bit of work to make format_dat_file.pl work with non-catalog data. Maybe mechanical formatting is not so important here because there is a smaller "schema", and less frequent change. Also, I imagine we'd have more freedom to place perl comments if we know the surroundings can't be shifted around. -- John Naylor Amazon Web Services
On 20.08.25 09:57, John Naylor wrote: > That makes sense. I noticed some comments were not carried over. In a > mechanical change like this, I'd expect all comments to be preserved > in some form, or at least removed in the preparatory step so that the > change is more visible to others. Here is an updated patch with the remaining comments carried over. I'm not sure how I lost these. I also added some more comments to the Perl script and have it print the usual boilerplate into the header. And I added some .gitignore entries. This seems pretty complete to me now.
Вложения
On Mon, Aug 25, 2025 at 4:36 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > Here is an updated patch with the remaining comments carried over. I'm > not sure how I lost these. I also added some more comments to the Perl > script and have it print the usual boilerplate into the header. And I > added some .gitignore entries. This seems pretty complete to me now. Looks good overall. Some style suggestions: + print $ofh qq[\t\t] + . ($entry->{check_hook} || 'NULL') . qq[, ] + . ($entry->{assign_hook} || 'NULL') . qq[, ] + . ($entry->{show_hook} || 'NULL') . qq[\n]; The string construction in this script is rather verbose. I'd do something like: printf $ofh "\t\t%s, %s, %s\n", $entry->{check_hook} || 'NULL', $entry->{assign_hook} || 'NULL', $entry->{show_hook} || 'NULL'; + print $ofh "#ifdef " . $entry->{ifdef} . "\n" if $entry->{ifdef}; Likewise: print $ofh "#ifdef $entry->{ifdef}\n" if $entry->{ifdef}; + print $ofh qq[\t\t\tgettext_noop("] + . escape($entry->{long_desc}) . qq[")]; If the "escape" function was a "quote" function that also did its own escaping, there'd be less need for these literal quotes, and so maybe no need for the "qq[]"'s here. + boot_val => '""', + boot_val => '"ISO, MDY"', A "quote" function could also insert these for config_string GUCs. -- John Naylor Amazon Web Services
On 27.08.25 01:38, John Naylor wrote: > Looks good overall. Some style suggestions: > > + print $ofh qq[\t\t] > + . ($entry->{check_hook} || 'NULL') . qq[, ] > + . ($entry->{assign_hook} || 'NULL') . qq[, ] > + . ($entry->{show_hook} || 'NULL') . qq[\n]; > > The string construction in this script is rather verbose. I'd do something like: > > printf $ofh "\t\t%s, %s, %s\n", > $entry->{check_hook} || 'NULL', > $entry->{assign_hook} || 'NULL', > $entry->{show_hook} || 'NULL'; > > + print $ofh "#ifdef " . $entry->{ifdef} . "\n" if $entry->{ifdef}; > > Likewise: > > print $ofh "#ifdef $entry->{ifdef}\n" if $entry->{ifdef}; > > + print $ofh qq[\t\t\tgettext_noop("] > + . escape($entry->{long_desc}) . qq[")]; > > If the "escape" function was a "quote" function that also did its own > escaping, there'd be less need for these literal quotes, and so maybe > no need for the "qq[]"'s here. Ok, good suggestions. I addressed all those, and did another cleanup pass over the script. (The formatting is from pgperltidy.) > + boot_val => '""', > > + boot_val => '"ISO, MDY"', > > A "quote" function could also insert these for config_string GUCs. The default values might not be string literals, so writing them out unquoted and having the script quoting them would not work in general. So I left this. Maybe this is something to fine-tune in the future.
Вложения
> On 28 Aug 2025, at 08:53, Peter Eisentraut <peter@eisentraut.org> wrote: > I addressed all those, and did another cleanup pass over the script. (The formatting is from pgperltidy.) A tiny nitpick is that all the other generator scripts (that I looked at) in the tree use GetOptions() with named parameter rather than dereference ARGV directly: +my $input_fname = $ARGV[0]; +my $output_fname = $ARGV[1]; Also, I would have directed the reader to guc_parameters.dat in the below comment since that's the canonical copy of the default value, but it's a personal preference as it can just as easily be argued for keeping guc_tables.c /* * We may not yet know where PGSHAREDIR is (in particular this is true in * an EXEC_BACKEND subprocess). So use "GMT", which pg_tzset forces to be * interpreted without reference to the filesystem. This corresponds to * the bootstrap default for these variables in guc_tables.c, although in * principle it could be different. */ Apart from those small remarks, +1 on this patch. For future work, once this has landed, it could be neat to add formatting checks to ensure long/short_desc end with proper punctuation etc. -- Daniel Gustafsson
On Aug 28, 2025, at 09:29, Daniel Gustafsson <daniel@yesql.se> wrote: > A tiny nitpick is that all the other generator scripts (that I looked at) in > the tree use GetOptions() with named parameter rather than dereference ARGV > directly: > > +my $input_fname = $ARGV[0]; > +my $output_fname = $ARGV[1]; GetOptions() is overkill when there are no options. I’d opt for: die "Usage: $0 INPUT_FILE OUTPUT_FILE\n" unless @ARGV == 2; my ($input_fname, $output_fname) = @ARGV; And since I griped about Perl style previously, I made a pass over modernizing it a bit. One might argue it’s less clear,of course; there is less alignment of the printing than in the original. Otherwise, I’d note: * Use the /r regex return sequence to simplify dquote() (requires Perl 5.14, IIRC) * Iterate over the data types in a single line * Pass lists of values to `print` * Use {$fh} syntax to make file handle arguments clearer But do with it what you will. One other thing, as an aside, and probably not worth changing this patch: I’d prefer to see the use of explicit I/O layers.IOW, rather than open my $ofh, '>', $output_fname Use the UTF-8 layer, which encodes strings as UTF-8 bytes: open my $ofh, '>:utf8', $output_fname Or perhaps use pure binary: open my $ofh, '>:raw', $output_fname Though then things like `ucfirst` won’t work properly for non-ASCII strings. The default layer, when not specified, is Latin-1 (because 1994). It’s not a problem if we’re certain we’ll never use anythingother than ASCII, but more explicit I/O layers would be clearer, IMO. I didn’t change it in the attached becauseCatalog.pm doesn’t use an I/O layer, either, so it’s best if they’re consistent. So, I guess, would there be interest in a patch to update I/O layer handling in the core Perl code? Best, David
Вложения
On Thu, Aug 28, 2025 at 1:53 PM Peter Eisentraut <peter@eisentraut.org> wrote: > Ok, good suggestions. I addressed all those, and did another cleanup > pass over the script. (The formatting is from pgperltidy.) I have no further comments on v3. On Fri, Aug 29, 2025 at 1:03 AM David E. Wheeler <david@justatheory.com> wrote: > And since I griped about Perl style previously, I made a pass over modernizing it a bit. One might argue it’s less clear,of course; there is less alignment of the printing than in the original. I find the loop over @parse a lot less readable this way. > * Use the /r regex return sequence to simplify dquote() (requires Perl 5.14, IIRC) I think our perlcritic configuration would complain about the lack of return statement. > * Use {$fh} syntax to make file handle arguments clearer With this I wonder why the variable looks different for `print` vs. `open`. -- John Naylor Amazon Web Services
Hi John, On Sep 1, 2025, at 03:21, John Naylor <johncnaylorls@gmail.com> wrote: > I find the loop over @parse a lot less readable this way. Yeah, that’s the trade-off. Probably not worth it to reduce legibility. >> * Use the /r regex return sequence to simplify dquote() (requires Perl 5.14, IIRC) > > I think our perlcritic configuration would complain about the lack of > return statement. Oh, I ran the indenter but not perlcritic. >> * Use {$fh} syntax to make file handle arguments clearer > > With this I wonder why the variable looks different for `print` vs. `open`. I’ve never seen {} used with open. I just tried it, and Perl complained: Use of uninitialized value $fh in anonymous hash ({}) at try line 8. The use of {$fh} with print/say always looked like an exception to me, I’m not sure what the syntax resolves to, but it’snot a hash reference. At any rate, Perl best practices I’ve been aware of (admittedly 10y or so out of date) recommendedit to distinguish the file handle from the values actually being printed. Best, David
Вложения
On 28.08.25 15:29, Daniel Gustafsson wrote: >> On 28 Aug 2025, at 08:53, Peter Eisentraut <peter@eisentraut.org> wrote: > >> I addressed all those, and did another cleanup pass over the script. (The formatting is from pgperltidy.) > > A tiny nitpick is that all the other generator scripts (that I looked at) in > the tree use GetOptions() with named parameter rather than dereference ARGV > directly: > > +my $input_fname = $ARGV[0]; > +my $output_fname = $ARGV[1]; I didn't use GetOptions() but changed it a bit per David Wheeler's suggestion downthread. > Also, I would have directed the reader to guc_parameters.dat in the below > comment since that's the canonical copy of the default value, but it's a > personal preference as it can just as easily be argued for keeping guc_tables.c I updated this comment and another one. (There are a few more comments referring to guc_tables.c, which I think are mostly still correct. I expect some of this will also be changed in the near future. One change for example could be to generate the enum declarations.)
On 28.08.25 20:03, David E. Wheeler wrote: > On Aug 28, 2025, at 09:29, Daniel Gustafsson <daniel@yesql.se> wrote: > >> A tiny nitpick is that all the other generator scripts (that I looked at) in >> the tree use GetOptions() with named parameter rather than dereference ARGV >> directly: >> >> +my $input_fname = $ARGV[0]; >> +my $output_fname = $ARGV[1]; > > GetOptions() is overkill when there are no options. I’d opt for: > > die "Usage: $0 INPUT_FILE OUTPUT_FILE\n" unless @ARGV == 2; > my ($input_fname, $output_fname) = @ARGV; Done that way. > And since I griped about Perl style previously, I made a pass over modernizing it a bit. One might argue it’s less clear,of course; there is less alignment of the printing than in the original. Otherwise, I’d note: > > * Use the /r regex return sequence to simplify dquote() (requires Perl 5.14, IIRC) I adopted that one. We already use that elsewhere. > * Iterate over the data types in a single line > * Pass lists of values to `print` > * Use {$fh} syntax to make file handle arguments clearer These don't match our existing style, so I'll leave that to someone who wants to make these sorts of changes across the whole tree.
On 01.09.25 09:21, John Naylor wrote: > On Thu, Aug 28, 2025 at 1:53 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> Ok, good suggestions. I addressed all those, and did another cleanup >> pass over the script. (The formatting is from pgperltidy.) > > I have no further comments on v3. I have committed this with a few of the small changes mentioned nearby. I have registered myself to a few current commitfest entries that are proposing to add or modify GUC entries to help them rebase or move forward to commit.
> On 3 Sep 2025, at 10:37, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 28.08.25 15:29, Daniel Gustafsson wrote: >>> On 28 Aug 2025, at 08:53, Peter Eisentraut <peter@eisentraut.org> wrote: >>> I addressed all those, and did another cleanup pass over the script. (The formatting is from pgperltidy.) >> A tiny nitpick is that all the other generator scripts (that I looked at) in >> the tree use GetOptions() with named parameter rather than dereference ARGV >> directly: >> +my $input_fname = $ARGV[0]; >> +my $output_fname = $ARGV[1]; > > I didn't use GetOptions() but changed it a bit per David Wheeler's suggestion downthread. FWIW I agree that it's overkill to use GetOptions here, it was mostly a reference to use using it elsewhere where it's equally overkill =) >> Also, I would have directed the reader to guc_parameters.dat in the below >> comment since that's the canonical copy of the default value, but it's a >> personal preference as it can just as easily be argued for keeping guc_tables.c > > I updated this comment and another one. > > (There are a few more comments referring to guc_tables.c, which I think are mostly still correct. Agreed, surveying the references I think most of them are correct in referring to the .c file even with this change. > I expect some of this will also be changed in the near future. One change for example could be to generate the enum declarations.) That's one I am looking forward to for sure. -- Daniel Gustafsson
On Wed, Sep 3, 2025 at 5:41 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I have committed this with a few of the small changes mentioned nearby. copperhead reported just now a test failure in test_oat_hooks: SET debug_discard_caches = 0; +ERROR: unrecognized configuration parameter "debug_discard_caches" I'm not sure if it's related to this commit, but I'm reporting it here just in case. - Richard
On 03.09.25 11:39, Richard Guo wrote: > On Wed, Sep 3, 2025 at 5:41 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> I have committed this with a few of the small changes mentioned nearby. > > copperhead reported just now a test failure in test_oat_hooks: > > SET debug_discard_caches = 0; > +ERROR: unrecognized configuration parameter "debug_discard_caches" > > I'm not sure if it's related to this commit, but I'm reporting it here > just in case. Yup, I pushed a fix already. Thanks.
I noticed that "make maintainer-clean" doesn't remove src/include/utils/guc_tables.inc.c. This seems to fix it: diff --git a/src/include/Makefile b/src/include/Makefile index 3f94543f327..58eb6da27fe 100644 --- a/src/include/Makefile +++ b/src/include/Makefile @@ -74,7 +74,7 @@ uninstall: clean: rm -f utils/fmgroids.h utils/fmgrprotos.h utils/errcodes.h utils/header-stamp rm -f storage/lwlocknames.h utils/probes.h utils/wait_event_types.h - rm -f nodes/nodetags.h nodes/header-stamp + rm -f nodes/nodetags.h nodes/header-stamp utils/guc_tables.inc.c $(MAKE) -C catalog clean distclean: clean -- nathan
On 03.09.25 19:14, Nathan Bossart wrote: > I noticed that "make maintainer-clean" doesn't remove > src/include/utils/guc_tables.inc.c. This seems to fix it: Thanks, fixed. (I moved the rule to a different line so it's closer to related files. But it's the same thing.)
While experimenting with a patch that adds a GUC, I was befuddled by the new GUC's failure to be available after I rebuilt and reinstalled. I eventually realized that the problem is that utils/misc/Makefile has no idea that guc_tables.o needs to be rebuilt after changing guc_tables.inc.c. Of course, this isn't a problem if you use --enable-depend, but I generally don't. IMO we have a project policy that dependencies other than dependencies on .h files should be understood by the Makefiles, so I think we need the attached. regards, tom lane diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile index b362ae43771..f142d17178b 100644 --- a/src/backend/utils/misc/Makefile +++ b/src/backend/utils/misc/Makefile @@ -40,6 +40,9 @@ ifdef krb_srvtab override CPPFLAGS += -DPG_KRB_SRVTAB='"$(krb_srvtab)"' endif +# Force this dependency to be known even without dependency info built: +guc_tables.o: guc_tables.c $(top_builddir)/src/backend/utils/guc_tables.inc.c + include $(top_srcdir)/src/backend/common.mk clean: