Re: How abnormal server shutdown could be detected by tests?
От | Alexander Lakhin |
---|---|
Тема | Re: How abnormal server shutdown could be detected by tests? |
Дата | |
Msg-id | 5921355f-4cfb-c91a-24b8-6bbde53c990c@gmail.com обсуждение исходный текст |
Ответ на | Re: How abnormal server shutdown could be detected by tests? (shveta malik <shveta.malik@gmail.com>) |
Список | pgsql-hackers |
Hello Shveta, 12.12.2023 11:44, shveta malik wrote: > >> The postmaster process exits with exit code 1, but pg_ctl can't get the >> code and just reports that stop was completed successfully. >> > For what it's worth, there is another thread which stated the similar problem: > https://www.postgresql.org/message-id/flat/2366244.1651681550%40sss.pgh.pa.us > Thank you for the reference! So I refreshed a first part of the question Tom Lane raised before... I've made a quick experiment with leaving postmaster.pid intact in case of abnormal shutdown: @@ -1113,6 +1113,7 @@ UnlinkLockFiles(int status, Datum arg) { char *curfile = (char *) lfirst(l); +if (strcmp(curfile, DIRECTORY_LOCK_FILE) != 0 || status == 0) unlink(curfile); /* Should we complain if the unlink fails? */ } and `make check-world` passed for me with no failure. (In the meantime, the assertion failure forced as above is detected.) Though there is a minor issue with a couple of tests. Namely, 003_recovery_targets.pl does the following: # wait for the error message in the standby log foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) { $logfile = slurp_file($node_primary->logfile()); $res = ($logfile =~ qr/FATAL: .* recovery ended before configured recovery target was reached/); if ($res) { last; } usleep(100_000); } ok($res, 'recovery end before target reached is a fatal error'); With postmaster.pid left after unclean shutdown, the test waits for 300 seconds by default and then completes successfully. If rewrite that loop as follows: # wait for the error message in the standby log foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) { $logfile = slurp_file($node_primary->logfile()); $res = ($logfile =~ qr/FATAL: .* recovery ended before configured recovery target was reached/); if ($res) { last; } usleep(100_000); } ok($res, 'recovery end before target reached is a fatal error'); the test completes as quickly as before. (standby.log is only 2kb, so rereading it isn't a big deal, IMO) So maybe it's the way to go? Another way I can think of is sending some signal to pg_ctl in case postmaster terminates with status 0. Though I think it would complicate things a little as it allows for three different states: postmaster.pid preserved (in case postmaster killed with -9), postmaster.pid removed and the signal received/not received. Best regards, Alexander
В списке pgsql-hackers по дате отправления: