How abnormal server shutdown could be detected by tests?
От | Alexander Lakhin |
---|---|
Тема | How abnormal server shutdown could be detected by tests? |
Дата | |
Msg-id | 290b9ae3-98a2-0896-a957-18d3b60b6260@gmail.com обсуждение исходный текст |
Ответы |
Re: How abnormal server shutdown could be detected by tests?
(shveta malik <shveta.malik@gmail.com>)
|
Список | pgsql-hackers |
Hello hackers, While studying bug #18158, I've come to the conclusion that the existing testing infrastructure is unable to detect abnormal situations. of some kind. Just a simple example: With Assert(0) injected in walsender (say: sed "s/WalSndDone(send_data)/Assert(0)/" -i src/backend/replication/walsender.c ), I observe the following: $ make -s check -C src/test/recovery PROVE_TESTS="t/012*" t/012_subtransactions.pl .. ok All tests successful. (The same with meson: 1/1 postgresql:recovery / recovery/012_subtransactions OK 3.24s 12 subtests passed) But: $ grep TRAP src/test/recovery/tmp_check/log/* src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 2528, PID: 373729 src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 2528, PID: 373750 src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 2528, PID: 373821 src/test/recovery/tmp_check/log/012_subtransactions_standby.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 2528, PID: 373786 src/test/recovery/tmp_check/log/012_subtransactions_primary.log contains: ... 2023-12-09 03:23:20.210 UTC [375933] LOG: server process (PID 375975) was terminated by signal 6: Aborted 2023-12-09 03:23:20.210 UTC [375933] DETAIL: Failed process was running: START_REPLICATION 0/3000000 TIMELINE 3 2023-12-09 03:23:20.210 UTC [375933] LOG: terminating any other active server processes 2023-12-09 03:23:20.210 UTC [375933] LOG: abnormal database system shutdown 2023-12-09 03:23:20.211 UTC [375933] LOG: database system is shut down ... So the shutdown definitely was considered abnormal, but we missed that. With log_min_messages = DEBUG3, I can see in the log: 2023-12-09 03:26:50.145 UTC [377844] LOG: abnormal database system shutdown 2023-12-09 03:26:50.145 UTC [377844] DEBUG: shmem_exit(1): 0 before_shmem_exit callbacks to make 2023-12-09 03:26:50.145 UTC [377844] DEBUG: shmem_exit(1): 5 on_shmem_exit callbacks to make 2023-12-09 03:26:50.145 UTC [377844] DEBUG: cleaning up orphaned dynamic shared memory with ID 2898643884 2023-12-09 03:26:50.145 UTC [377844] DEBUG: cleaning up dynamic shared memory control segment with ID 3446966170 2023-12-09 03:26:50.146 UTC [377844] DEBUG: proc_exit(1): 2 callbacks to make 2023-12-09 03:26:50.146 UTC [377844] LOG: database system is shut down 2023-12-09 03:26:50.146 UTC [377844] DEBUG: exit(1) 2023-12-09 03:26:50.146 UTC [377844] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make 2023-12-09 03:26:50.146 UTC [377844] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make 2023-12-09 03:26:50.146 UTC [377844] DEBUG: proc_exit(-1): 0 callbacks to make The postmaster process exits with exit code 1, but pg_ctl can't get the code and just reports that stop was completed successfully. Should this be improved? And if yes, how it can be done? Maybe postmaster shouldn't remove it's postmaster.pid when it exits abnormally to let pg_ctl know of it? Best regards, Alexander
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: Make COPY format extendable: Extract COPY TO format implementations