How abnormal server shutdown could be detected by tests?

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема How abnormal server shutdown could be detected by tests?
Дата
Msg-id 290b9ae3-98a2-0896-a957-18d3b60b6260@gmail.com
обсуждение исходный текст
Ответы Re: How abnormal server shutdown could be detected by tests?  (shveta malik <shveta.malik@gmail.com>)
Список pgsql-hackers
Hello hackers,

While studying bug #18158, I've come to the conclusion that the existing
testing infrastructure is unable to detect abnormal situations. of some
kind.

Just a simple example:
With Assert(0) injected in walsender (say:
sed "s/WalSndDone(send_data)/Assert(0)/" -i src/backend/replication/walsender.c
), I observe the following:
$ make -s check -C src/test/recovery PROVE_TESTS="t/012*"

t/012_subtransactions.pl .. ok
All tests successful.

(The same with meson:
1/1 postgresql:recovery / recovery/012_subtransactions OK                3.24s   12 subtests passed)

But:
$ grep TRAP src/test/recovery/tmp_check/log/*
src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 
2528, PID: 373729
src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 
2528, PID: 373750
src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 
2528, PID: 373821
src/test/recovery/tmp_check/log/012_subtransactions_standby.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 
2528, PID: 373786

src/test/recovery/tmp_check/log/012_subtransactions_primary.log contains:
...
2023-12-09 03:23:20.210 UTC [375933] LOG:  server process (PID 375975) was terminated by signal 6: Aborted
2023-12-09 03:23:20.210 UTC [375933] DETAIL:  Failed process was running: START_REPLICATION 0/3000000 TIMELINE 3
2023-12-09 03:23:20.210 UTC [375933] LOG:  terminating any other active server processes
2023-12-09 03:23:20.210 UTC [375933] LOG:  abnormal database system shutdown
2023-12-09 03:23:20.211 UTC [375933] LOG:  database system is shut down
...

So the shutdown definitely was considered abnormal, but we missed that.

With log_min_messages = DEBUG3, I can see in the log:
2023-12-09 03:26:50.145 UTC [377844] LOG:  abnormal database system shutdown
2023-12-09 03:26:50.145 UTC [377844] DEBUG:  shmem_exit(1): 0 before_shmem_exit callbacks to make
2023-12-09 03:26:50.145 UTC [377844] DEBUG:  shmem_exit(1): 5 on_shmem_exit callbacks to make
2023-12-09 03:26:50.145 UTC [377844] DEBUG:  cleaning up orphaned dynamic shared memory with ID 2898643884
2023-12-09 03:26:50.145 UTC [377844] DEBUG:  cleaning up dynamic shared memory control segment with ID 3446966170
2023-12-09 03:26:50.146 UTC [377844] DEBUG:  proc_exit(1): 2 callbacks to make
2023-12-09 03:26:50.146 UTC [377844] LOG:  database system is shut down
2023-12-09 03:26:50.146 UTC [377844] DEBUG:  exit(1)
2023-12-09 03:26:50.146 UTC [377844] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2023-12-09 03:26:50.146 UTC [377844] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2023-12-09 03:26:50.146 UTC [377844] DEBUG:  proc_exit(-1): 0 callbacks to make

The postmaster process exits with exit code 1, but pg_ctl can't get the
code and just reports that stop was completed successfully.

Should this be improved? And if yes, how it can be done?
Maybe postmaster shouldn't remove it's postmaster.pid when it exits
abnormally to let pg_ctl know of it?

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Make COPY format extendable: Extract COPY TO format implementations
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: Eager page freeze criteria clarification