Re: The real reason why TAP testing isn't ready for prime time
От | Michael Paquier |
---|---|
Тема | Re: The real reason why TAP testing isn't ready for prime time |
Дата | |
Msg-id | CAB7nPqRJo85TTiq7-O-sSOoMJFSdDtNQCdyXPTojR6Lui58J8g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: The real reason why TAP testing isn't ready for prime time (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: The real reason why TAP testing isn't ready for prime time
Re: The real reason why TAP testing isn't ready for prime time |
Список | pgsql-hackers |
On Thu, Jun 18, 2015 at 3:52 PM, Michael Paquier wrote: > I think that it would be useful as well to improve the buildfarm > output. Thoughts? And after running the tests more or less 6~7 times in a row on a PI, I have been able to trigger the problem and I think that I have found its origin. First, the error has been triggered by the tests of pg_rewind: t/002_databases.pl ... 1..4 Bailout called. Further testing stopped: run pg_ctl failed: 256 Bail out! run pg_ctl failed: 256 FAILED--Further testing stopped: run pg_ctl failed: 256 Makefile:51: recipe for target 'check' failed make[1]: *** [check] Error 255 And by looking at the logs obtained thanks to the previous patch I could see the following (log attached for tests 1 and 2): $ tail -n5 regress_log/regress_log_002_databases waiting for server to start........ stopped waiting pg_ctl: could not start server Examine the log output. LOG: received immediate shutdown request LOG: received immediate shutdown request pg_ctl should be able to start the server and should not fail here. This is confirmed by the fact that first test has not stopped the servers. On a clean run, the immediate shutdown request is received and done: waiting for server to shut down....LOG: received immediate shutdown request LOG: unexpected EOF on standby connection done But in the case of the failure this does not happen: LOG: received immediate shutdown request LOG: unexpected EOF on standby connection LOG: received immediate shutdown request See the "done" is not here. Now if we look at RewindTest.pm, there is the following code: if ($test_master_datadir) { system "pg_ctl -D $test_master_datadir -s -m immediate stop 2> /dev/null"; } if ($test_standby_datadir) { system "pg_ctl -D $test_standby_datadir -s -m immediate stop 2> /dev/null"; } And I think that the problem is triggered because we are missing a -w switch here, meaning that we do not wait until the confirmation that the server has stopped, and visibly if stop is slow enough the next server to use cannot start because the port is already taken by the server currently stopping. Note as well that the last command of pg_ctl stop in pg_ctl/t/002_status.pl does not use -w, so we have the same problem there. Attached is a patch fixing those problems and improving the log facility as it really helped me out with those issues. The simplest fix would be to include the -w switch missing in the tests of pg_rewind and pg_ctl though. It would be good to get that fixed, then I would be able to re-enable the TAP tests of hamster. I have run the tests a dozen of times again with this patch, and I could not trigger the failure anymore. Regards, -- Michael
Вложения
В списке pgsql-hackers по дате отправления: