Обсуждение: Re: A assert failure when initdb with track_commit_timestamp=on
On 2025/07/03 22:31, Andy Fan wrote: > "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> writes: > >> Dear Michael, Fujii-san, >> >>> Ah, indeed, so it was reported a couple of months ago. I am not sure >>> that the argument about all the other GUCs potentially impacted holds >>> much value; we are talking about a specific code path. >> >> Yeah, I did report but sadly it was missed by others :-(. To clarify, >> The current patch looks good to me. > > Then I'd thank Michael to watch the maillist closely this time. > > I checked the fix suggested by Hayato, I think his patch is better than > me because his patch checks at the startup time while my patch checks at > each time of RecordTransactionCommit. So v3 takes his patch. v3 also > added the testcase suggested by Michael for test coverage, it clearly > proves the bug is fixed now. The patch looks good to me. By the way, although it's a separate issue, I noticed that running initdb -c transaction_timeout=1 causes an assertion failure: running bootstrap script ... TRAP: failed Assert("all_timeouts_initialized"), File: "timeout.c", Line: 164, PID: 22057 0 postgres 0x00000001105d9d02 ExceptionalCondition + 178 1 postgres 0x0000000110612af7 enable_timeout + 55 2 postgres 0x0000000110612aa9 enable_timeout_after + 73 3 postgres 0x000000010fead8e0 StartTransaction + 816 4 postgres 0x000000010fead4a1 StartTransactionCommand + 65 5 postgres 0x000000010fef01de BootstrapModeMain + 1518 6 postgres 0x0000000110167ef4 main + 676 7 dyld 0x00007ff805092530 start + 3056 child process was terminated by signal 6: Abort trap: 6 This happens because enable_timeout() tries to activate the transaction timeout before InitializeTimeouts() has been called — in other words, the timeout system hasn't been initialized yet. To fix this, we might need to forcibly set transaction_timeout to 0 during bootstrap. Regards, -- Fujii Masao NTT DATA Japan Corporation
On Fri, Jul 04, 2025 at 01:26:19AM +0900, Fujii Masao wrote: > On 2025/07/03 22:31, Andy Fan wrote: >> I checked the fix suggested by Hayato, I think his patch is better than >> me because his patch checks at the startup time while my patch checks at >> each time of RecordTransactionCommit. So v3 takes his patch. v3 also >> added the testcase suggested by Michael for test coverage, it clearly >> proves the bug is fixed now. > > The patch looks good to me. I was wondering if we should backpatch that, and decided towards a yes here as it can be annoying. 3e51b278db6a has introduced the -c switch in initdb in v16, but that could be reached as well if one touches at some initialization path, perhaps in a fork. Done, down to v13. Let's tackle the rest separately. -- Michael
Вложения
Dear Fujii-san, > By the way, although it's a separate issue, I noticed that running > initdb -c transaction_timeout=1 causes an assertion failure: I feel it may be able to discuss in other places but let me say one comment. > running bootstrap script ... TRAP: failed Assert("all_timeouts_initialized"), File: > "timeout.c", Line: 164, PID: 22057 > 0 postgres 0x00000001105d9d02 > ExceptionalCondition + 178 > 1 postgres 0x0000000110612af7 > enable_timeout + 55 > 2 postgres 0x0000000110612aa9 > enable_timeout_after + 73 > 3 postgres 0x000000010fead8e0 > StartTransaction + 816 > 4 postgres 0x000000010fead4a1 > StartTransactionCommand + 65 > 5 postgres 0x000000010fef01de > BootstrapModeMain + 1518 > 6 postgres 0x0000000110167ef4 main + 676 > 7 dyld 0x00007ff805092530 start + 3056 > child process was terminated by signal 6: Abort trap: 6 > > This happens because enable_timeout() tries to activate the transaction > timeout before InitializeTimeouts() has been called ? in other words, > the timeout system hasn't been initialized yet. To fix this, we might > need to forcibly set transaction_timeout to 0 during bootstrap. If more GUCs were found which cannot be set during the bootstrap mode, how about introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables? If the flag is set all setting can be ignored when IsBootstrapProcessingMode() = true. Best regards, Hayato Kuroda FUJITSU LIMITED
On 2025/07/04 16:29, Hayato Kuroda (Fujitsu) wrote: > Dear Fujii-san, > >> By the way, although it's a separate issue, I noticed that running >> initdb -c transaction_timeout=1 causes an assertion failure: > > I feel it may be able to discuss in other places OK, I've started a new thread for this issue at [1]. > If more GUCs were found which cannot be set during the bootstrap mode, how about > introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables? > If the flag is set all setting can be ignored when > IsBootstrapProcessingMode() = true. If there are many GUCs that behave incorrectly during bootstrap, a general mechanism like that might be worth considering. But if only a few GUCs are affected, as I believe is the case, then such a mechanism may be overkill. In that case, IMO it should be sufficient to disable the problematic GUCs individually, for example by calling SetConfigOption(..., PGC_S_OVERRIDE). Regards, [1] https://postgr.es/m/a68fae7d-f45a-4c70-8d90-2a2cd3bdcfca@oss.nttdata.com -- Fujii Masao NTT DATA Japan Corporation
Fujii Masao <masao.fujii@oss.nttdata.com> writes: > On 2025/07/04 16:29, Hayato Kuroda (Fujitsu) wrote: >> If more GUCs were found which cannot be set during the bootstrap mode, how about >> introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables? >> If the flag is set all setting can be ignored when >> IsBootstrapProcessingMode() = true. > If there are many GUCs that behave incorrectly during bootstrap, > a general mechanism like that might be worth considering. But if > only a few GUCs are affected, as I believe is the case, > then such a mechanism may be overkill. As I remarked in the other thread, I don't like inventing a different solution for each GUC. So if there are even two that need something done, I think Hayato-san's idea has merit. The core of the patch could be as little as diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 667df448732..43f289924e6 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3464,6 +3464,15 @@ set_config_with_handle(const char *name, config_handle *handle, return 0; } + /* + * Certain GUCs aren't safe to enable during bootstrap mode. Silently + * ignore attempts to set them to non-default values. + */ + if (unlikely(IsBootstrapProcessingMode()) && + (record->flags & GUC_IGNORE_IN_BOOTSTRAP) && + source != PGC_S_DEFAULT) + changeVal = false; + /* * Check if the option can be set at this time. See guc.h for the precise * rules. If we went this way, we'd presumably revert 5a6c39b6d in favor of marking track_commit_timestamp with this flag. regards, tom lane
On 2025/07/05 0:30, Tom Lane wrote: > Fujii Masao <masao.fujii@oss.nttdata.com> writes: >> On 2025/07/04 16:29, Hayato Kuroda (Fujitsu) wrote: >>> If more GUCs were found which cannot be set during the bootstrap mode, how about >>> introducing a new flag like GUC_DEFAULT_WHILE_BOOTSTRAPPING for GUC variables? >>> If the flag is set all setting can be ignored when >>> IsBootstrapProcessingMode() = true. > >> If there are many GUCs that behave incorrectly during bootstrap, >> a general mechanism like that might be worth considering. But if >> only a few GUCs are affected, as I believe is the case, >> then such a mechanism may be overkill. > > As I remarked in the other thread, I don't like inventing a different > solution for each GUC. So if there are even two that need something > done, I think Hayato-san's idea has merit. > > The core of the patch could be as little as > > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c > index 667df448732..43f289924e6 100644 > --- a/src/backend/utils/misc/guc.c > +++ b/src/backend/utils/misc/guc.c > @@ -3464,6 +3464,15 @@ set_config_with_handle(const char *name, config_handle *handle, > return 0; > } > > + /* > + * Certain GUCs aren't safe to enable during bootstrap mode. Silently > + * ignore attempts to set them to non-default values. > + */ > + if (unlikely(IsBootstrapProcessingMode()) && > + (record->flags & GUC_IGNORE_IN_BOOTSTRAP) && > + source != PGC_S_DEFAULT) > + changeVal = false; > + > /* > * Check if the option can be set at this time. See guc.h for the precise > * rules. This code seems to assume that the processing mode is switched to bootstrap before GUC parameters are processed. But is that actually the case? Regards, -- Fujii Masao NTT DATA Japan Corporation
Fujii Masao <masao.fujii@oss.nttdata.com> writes: > On 2025/07/05 0:30, Tom Lane wrote: >> As I remarked in the other thread, I don't like inventing a different >> solution for each GUC. So if there are even two that need something >> done, I think Hayato-san's idea has merit. > This code seems to assume that the processing mode is switched to bootstrap before > GUC parameters are processed. But is that actually the case? Oh, good point. But there doesn't seem to be any ill effect from making BootstrapModeMain set BootstrapProcessing a bit earlier. Attached is a proof-of-concept that I've actually tested. However, what I find with this POC is that initdb -c transaction_timeout=10s goes through fine, but (at least on my machine) initdb -c transaction_timeout=1 yields ... running bootstrap script ... ok performing post-bootstrap initialization ... 2025-07-04 13:08:04.225 EDT [261836] FATAL: terminating connection due to transactiontimeout child process exited with exit code 1 because 1ms is not enough time to complete the post-bootstrap run. I would argue that that's pilot error and we did exactly what the user demanded, but is there anyone who wants to say that we should suppress such GUCs during post-bootstrap too? regards, tom lane diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index fc8638c1b61..facad43c74c 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -220,6 +220,9 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) argv++; argc--; + SetProcessingMode(BootstrapProcessing); + IgnoreSystemIndexes = true; + while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1) { switch (flag) @@ -321,9 +324,6 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) CreateDataDirLockFile(false); - SetProcessingMode(BootstrapProcessing); - IgnoreSystemIndexes = true; - InitializeMaxBackends(); /* diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 667df448732..9555b363c34 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3464,6 +3464,15 @@ set_config_with_handle(const char *name, config_handle *handle, return 0; } + /* + * Certain GUCs aren't safe to enable during bootstrap mode. Silently + * ignore attempts to set them to non-default values. + */ + if (unlikely(IsBootstrapProcessingMode()) && + strcmp(record->name, "transaction_timeout") == 0 && + source != PGC_S_DEFAULT) + changeVal = false; + /* * Check if the option can be set at this time. See guc.h for the precise * rules.
Michael Paquier <michael@paquier.xyz> writes: > This is assuming that the default value assigned to a GUC will always > take the right decision in the bootstrap case, which is perhaps OK > anyway in most cases, or we would know about that during initdb. Yeah, I've been wondering about whether the code ought to accept source == PGC_S_DYNAMIC_DEFAULT. It doesn't matter until/unless we need to set this flag on a GUC that has code to compute a dynamic default, so any decision we made right now would be made in a vacuum ... but perhaps the right guess is to allow it. >> If we went this way, we'd presumably revert 5a6c39b6d in favor >> of marking track_commit_timestamp with this flag. > Makes sense, on HEAD. Well, you back-patched 5a6c39b6d, so it's not clear to me why we wouldn't want to back-patch something to fix any other GUC suffering from a comparable problem. I don't see that adding another possible GUC flag is an ABI break, especially when it's a flag that no extension could have a need to set. regards, tom lane
On 2025/07/05 2:17, Tom Lane wrote: > Fujii Masao <masao.fujii@oss.nttdata.com> writes: >> On 2025/07/05 0:30, Tom Lane wrote: >>> As I remarked in the other thread, I don't like inventing a different >>> solution for each GUC. So if there are even two that need something >>> done, I think Hayato-san's idea has merit. > >> This code seems to assume that the processing mode is switched to bootstrap before >> GUC parameters are processed. But is that actually the case? > > Oh, good point. But there doesn't seem to be any ill effect from > making BootstrapModeMain set BootstrapProcessing a bit earlier. Maybe. But I noticed that your patch also moves the line "IgnoreSystemIndexes = true;" earlier. Why did you make this change? This could cause initdb to fail with a PANIC error when run with ignore_system_indexes=off, like this: $ initdb -D data -c ignore_system_indexes=off ... FATAL: could not open relation with OID 2703 PANIC: cannot abort transaction 1, it was already committed So perhaps "IgnoreSystemIndexes = true;" should be placed after GUCs are processed? Or GUC ignore_system_indexes also should be treated in the same way as transaction_timeout? Regards, -- Fujii Masao NTT DATA Japan Corporation
Fujii Masao <masao.fujii@oss.nttdata.com> writes: > On 2025/07/05 2:17, Tom Lane wrote: >> Oh, good point. But there doesn't seem to be any ill effect from >> making BootstrapModeMain set BootstrapProcessing a bit earlier. > Maybe. But I noticed that your patch also moves the line "IgnoreSystemIndexes = true;" > earlier. Why did you make this change? It just seemed to go with the bootstrap-mode setting. But your example shows differently: > This could cause initdb to fail with a PANIC error when run with ignore_system_indexes=off, > like this: > $ initdb -D data -c ignore_system_indexes=off > ... > FATAL: could not open relation with OID 2703 > PANIC: cannot abort transaction 1, it was already committed > So perhaps "IgnoreSystemIndexes = true;" should be placed after GUCs are processed? Yeah, we should do it like that (and probably also have a comment...) > Or GUC ignore_system_indexes also should be treated in the same way > as transaction_timeout? Yes, I'd say we ought to mark that GUC as don't-accept-in-bootstrap too. I've not done any research about what other GUCs can break initdb, but now I'm starting to suspect there are several. BTW, I now realize that this is only an issue starting from v16. Before that initdb didn't have a -c switch, so there was not a way for people to shove random settings into it. regards, tom lane
I wrote: > Fujii Masao <masao.fujii@oss.nttdata.com> writes: >> Or GUC ignore_system_indexes also should be treated in the same way >> as transaction_timeout? > Yes, I'd say we ought to mark that GUC as don't-accept-in-bootstrap > too. I've not done any research about what other GUCs can break > initdb, but now I'm starting to suspect there are several. Here's a fleshed-out implementation of Hayato-san's idea. I've not done anything about reverting 5a6c39b6d, nor have I done any checks to see if there are other GUCs we ought to mark similarly. (But at this point I'd be prepared to bet that there are.) regards, tom lane diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index fc8638c1b61..f6ca9e8632c 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -213,6 +213,12 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) /* Set defaults, to be overridden by explicit options below */ InitializeGUCOptions(); + /* Override ignore_system_indexes: we have no indexes during bootstrap */ + IgnoreSystemIndexes = true; + + /* Set bootstrap mode; note that this locks down values of some GUCs */ + SetProcessingMode(BootstrapProcessing); + /* an initial --boot or --check should be present */ Assert(argc > 1 && (strcmp(argv[1], "--boot") == 0 @@ -321,9 +327,6 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) CreateDataDirLockFile(false); - SetProcessingMode(BootstrapProcessing); - IgnoreSystemIndexes = true; - InitializeMaxBackends(); /* diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 667df448732..215a20a1f06 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3464,6 +3464,15 @@ set_config_with_handle(const char *name, config_handle *handle, return 0; } + /* + * Certain GUCs aren't safe to enable during bootstrap mode. Silently + * ignore attempts to set them to non-default values. + */ + if (unlikely(IsBootstrapProcessingMode()) && + (record->flags & GUC_NOT_IN_BOOTSTRAP) && + source > PGC_S_DYNAMIC_DEFAULT) + changeVal = false; + /* * Check if the option can be set at this time. See guc.h for the precise * rules. diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 511dc32d519..064c6ba09e2 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1089,7 +1089,8 @@ struct config_bool ConfigureNamesBool[] = { {"track_commit_timestamp", PGC_POSTMASTER, REPLICATION_SENDING, gettext_noop("Collects transaction commit time."), - NULL + NULL, + GUC_NOT_IN_BOOTSTRAP }, &track_commit_timestamp, false, @@ -1929,7 +1930,7 @@ struct config_bool ConfigureNamesBool[] = gettext_noop("Disables reading from system indexes."), gettext_noop("It does not prevent updating the indexes, so it is safe " "to use. The worst consequence is slowness."), - GUC_NOT_IN_SAMPLE + GUC_NOT_IN_SAMPLE | GUC_NOT_IN_BOOTSTRAP }, &IgnoreSystemIndexes, false, @@ -2763,7 +2764,7 @@ struct config_int ConfigureNamesInt[] = {"transaction_timeout", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Sets the maximum allowed duration of any transaction within a session (not a prepared transaction)."), gettext_noop("0 disables the timeout."), - GUC_UNIT_MS + GUC_UNIT_MS | GUC_NOT_IN_BOOTSTRAP }, &TransactionTimeout, 0, 0, INT_MAX, diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index f619100467d..0e7e97dabf0 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -228,6 +228,7 @@ typedef enum 0x002000 /* can't set in PG_AUTOCONF_FILENAME */ #define GUC_RUNTIME_COMPUTED 0x004000 /* delay processing in 'postgres -C' */ #define GUC_ALLOW_IN_PARALLEL 0x008000 /* allow setting in parallel mode */ +#define GUC_NOT_IN_BOOTSTRAP 0x010000 /* can't set in bootstrap mode */ #define GUC_UNIT_KB 0x01000000 /* value is in kilobytes */ #define GUC_UNIT_BLOCKS 0x02000000 /* value is in blocks */
> I wrote: >> Fujii Masao <masao.fujii@oss.nttdata.com> writes: >>> Or GUC ignore_system_indexes also should be treated in the same way >>> as transaction_timeout? > >> Yes, I'd say we ought to mark that GUC as don't-accept-in-bootstrap >> too. I've not done any research about what other GUCs can break >> initdb, but now I'm starting to suspect there are several. > > Here's a fleshed-out implementation of Hayato-san's idea. I've > not done anything about reverting 5a6c39b6d, nor have I done any > checks to see if there are other GUCs we ought to mark similarly. > (But at this point I'd be prepared to bet that there are.) I pay my attention to two cases, both of them are good. (1). Revert the old commit 5a6c39b6d first, and apply your patch. verify initdb with -c transaction_timeout and track_commit_timestamp, both of them works well. transaction_timeout with a smaller value raise transaction_timeout error. and a biggger value works well. (2). after (1), check values of transaction_timeout and track_commit_timestamp in postgres.conf, both of them are good (with the default value as the user provided in the initdb commandline). So the patch looks good to me, thanks for paying attention to this issue! -- Best Regards Andy Fan
On 2025/07/06 3:00, Tom Lane wrote: > I wrote: >> Fujii Masao <masao.fujii@oss.nttdata.com> writes: >>> Or GUC ignore_system_indexes also should be treated in the same way >>> as transaction_timeout? > >> Yes, I'd say we ought to mark that GUC as don't-accept-in-bootstrap >> too. I've not done any research about what other GUCs can break >> initdb, but now I'm starting to suspect there are several. > > Here's a fleshed-out implementation of Hayato-san's idea. I've > not done anything about reverting 5a6c39b6d, nor have I done any > checks to see if there are other GUCs we ought to mark similarly. > (But at this point I'd be prepared to bet that there are.) Thanks for the patch! It looks good to me. Shouldn't we also add a TAP test to verify that initdb works correctly with GUCs marked as GUC_NOT_IN_BOOTSTRAP? Regards, -- Fujii Masao NTT DATA Japan Corporation
Dear Fujii-san, Andy, > Shouldn't we also add a TAP test to verify that initdb works correctly > with GUCs marked as GUC_NOT_IN_BOOTSTRAP? Another place we can put the test is 001_initdb.pl, like: ``` --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -331,4 +331,15 @@ command_fails( [ 'pg_checksums', '--pgdata' => $datadir_nochecksums ], "pg_checksums fails with data checksum disabled"); +# Some GUCs like track_commit_timestamp cannot be set to non-default value in +# bootstrap mode becasue they may cause crash. Ensure settings can be surely +# ignored. +command_ok( + [ + 'initdb', "$tempdir/dataXX", + '-c' => 'track_commit_timestamp=on', + '-c' => 'transaction_timeout=200s' + ], + 'successful creation with ignored settings'); + ``` But both Andy's patch and mine assume that post-bootstrap transactions can be finished within the specified time. Extremely long value is set above but I cannot say all machine won't spend 200s here... Best regards, Hayato Kuroda FUJITSU LIMITED
On Wed, Jul 09, 2025 at 02:39:22AM +0000, Hayato Kuroda (Fujitsu) wrote: > +# Some GUCs like track_commit_timestamp cannot be set to non-default value in > +# bootstrap mode becasue they may cause crash. Ensure settings can be surely > +# ignored. > +command_ok( > + [ > + 'initdb', "$tempdir/dataXX", > + '-c' => 'track_commit_timestamp=on', > + '-c' => 'transaction_timeout=200s' > + ], > + 'successful creation with ignored settings'); > + > ``` I'd suggest to keep them separate across multiple scripts, where they hold meaning, as one failure may get hidden by the other. track_commit_timestamp makes sense in the test module commit_ts, we should select a second location for transaction_timeout if we keep it at the end. > But both Andy's patch and mine assume that post-bootstrap transactions can be > finished within the specified time. Extremely long value is set above but I > cannot say all machine won't spend 200s here... A fresh initdb can be longer than this threshold under CLOBBER_CACHE_ALWAYS, if my memory serves me well. There are some machines with a valgrind setup, additionally, that can take some time, but I am not sure about their timings when it comes to a bootstrap setup. -- Michael
Вложения
Dear Michael, > I'd suggest to keep them separate across multiple scripts, where they > hold meaning, as one failure may get hidden by the other. > track_commit_timestamp makes sense in the test module commit_ts, we > should select a second location for transaction_timeout if we keep it > at the end. OK, so track_commit_timestamp can be tested like what initially did: ``` --- a/src/test/modules/commit_ts/t/001_base.pl +++ b/src/test/modules/commit_ts/t/001_base.pl @@ -11,8 +11,7 @@ use Test::More; use PostgreSQL::Test::Cluster; my $node = PostgreSQL::Test::Cluster->new('foxtrot'); -$node->init; -$node->append_conf('postgresql.conf', 'track_commit_timestamp = on'); +$node->init(extra => [ '-c', 'track_commit_timestamp=on' ]); $node->start; ``` > A fresh initdb can be longer than this threshold under > CLOBBER_CACHE_ALWAYS, if my memory serves me well. There are some > machines with a valgrind setup, additionally, that can take some time, > but I am not sure about their timings when it comes to a bootstrap > setup. Hmm. So I felt that we should not add tests for transaction_timeout for such a slow environment. Thought? Best regards, Hayato Kuroda FUJITSU LIMITED