Обсуждение: [HACKERS] tap tests on older branches fail if concurrency is used
Hi, when using $ cat ~/.proverc -j9 some tests fail for me in 9.4 and 9.5. E.g. src/bin/script's tests yields a lot of fun like: $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check) ... # LOG: received immediate shutdown request # WARNING: terminating connection because of crash of another server process # DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. # HINT: In a moment you should be able to reconnect to the database and repeat your command. ... it appears as if various tests are trampling over each other. If needed I can provide detailed logs, but it appears to readily reproduce on several machines... See Michael, I'll provide the details and a reproducer ;) Greetings, Andres Freund
On 1 June 2017 at 08:15, Andres Freund <andres@anarazel.de> wrote: > Hi, > > when using > $ cat ~/.proverc > -j9 > > some tests fail for me in 9.4 and 9.5. E.g. src/bin/script's tests > yields a lot of fun like: > $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check) > ... > # LOG: received immediate shutdown request > # WARNING: terminating connection because of crash of another server process > # DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > # HINT: In a moment you should be able to reconnect to the database and repeat your command. > ... > > it appears as if various tests are trampling over each other. If needed > I can provide detailed logs, but it appears to readily reproduce on > several machines... I'll take a look at what's changed and why it's happening and get back to you. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1 June 2017 at 08:15, Andres Freund <andres@anarazel.de> wrote: > Hi, > > when using > $ cat ~/.proverc > -j9 > > some tests fail for me in 9.4 and 9.5. E.g. src/bin/script's tests > yields a lot of fun like: > $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check) > ... > # LOG: received immediate shutdown request > # WARNING: terminating connection because of crash of another server process > # DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > # HINT: In a moment you should be able to reconnect to the database and repeat your command. > ... > > it appears as if various tests are trampling over each other. None of those scripts use PostgresNode, which I thought was added in 9.5, but apparently was actually introduced in 9.6. They do all their own setup/teardown using TestLib.pm routines. TestLib uses a unique tempdir for each test run, sets it as the unix socket directory, and disables listening on tcp, so the most obvious conflict is hidden. The immediate problem appears to be that they all use tmp_check/postmaster.log . So anything that examines the logs gets confused by seeing some other postgres instance's logs, or a mixture, trampling everywhere. I'll be surprised if there aren't other problems though. Rather than trying to fix it all up, this seems like a good argument for backporting the updated suite from 9.6 or pg10, with PostgresNode etc. I already have a working tree with that done to use src/test/recovery in 9.5, but haven't updated src/bin/scripts etc yet. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@anarazel.de> writes: > when using > $ cat ~/.proverc > -j9 > some tests fail for me in 9.4 and 9.5. Weren't there fixes specifically intended to make that safe, awhile ago? regards, tom lane
On Wed, May 31, 2017 at 8:45 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 1 June 2017 at 08:15, Andres Freund <andres@anarazel.de> wrote: >> Hi, >> >> when using >> $ cat ~/.proverc >> -j9 >> >> some tests fail for me in 9.4 and 9.5. E.g. src/bin/script's tests >> yields a lot of fun like: >> $ (cd ~/build/postgres/9.5-assert/vpath/src/bin/scripts/ && make check) >> ... >> # LOG: received immediate shutdown request >> # WARNING: terminating connection because of crash of another server process >> # DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. >> # HINT: In a moment you should be able to reconnect to the database and repeat your command. >> ... >> >> it appears as if various tests are trampling over each other. They are. The problem can be easily reproduced on my side with that: PROVE_FLAGS="-j 9" make check It would be nice to get a minimum of stability for those tests in back-branches even if PostgresNode.pm is not back-patched. > The immediate problem appears to be that they all use > tmp_check/postmaster.log . So anything that examines the logs gets > confused by seeing some other postgres instance's logs, or a mixture, > trampling everywhere. Amen. > I'll be surprised if there aren't other problems though. Rather than > trying to fix it all up, this seems like a good argument for > backporting the updated suite from 9.6 or pg10, with PostgresNode etc. > I already have a working tree with that done to use src/test/recovery > in 9.5, but haven't updated src/bin/scripts etc yet. Yup. Even if PostgresNode.pm is not back-patched, a small trick is to append the PID of the process running the TAP test to the log file name as in the patch attached. This gives enough uniqueness for the tests to pass with a high parallel degree. A second error that I have spotted is in the tests of pg_rewind, which would fail in parallel as the same data folders are used for each test. Using the same trick with $$ makes the tests more stable. A third error is a failure in contrib/test_decoding, and this has been addressed by Andres in 60f826c. Attached is a patch for the first two ones, which makes the tests more robust. I am myself annoyed by parallel tests failing when working on patches for back-branches, so having at least a minimal fix would be nice. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Jun 1, 2017 at 10:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> when using >> $ cat ~/.proverc >> -j9 >> some tests fail for me in 9.4 and 9.5. > > Weren't there fixes specifically intended to make that safe, awhile ago? 60f826c has not been back-patched. While this would fix parallel runs with make's --jobs, PROVE_FLAGS="-j X" would still fail. -- Michael
On 7 June 2017 at 13:39, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Jun 1, 2017 at 10:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Andres Freund <andres@anarazel.de> writes: >>> when using >>> $ cat ~/.proverc >>> -j9 >>> some tests fail for me in 9.4 and 9.5. >> >> Weren't there fixes specifically intended to make that safe, awhile ago? > > 60f826c has not been back-patched. While this would fix parallel runs > with make's --jobs, PROVE_FLAGS="-j X" would still fail. Ah, that's why I didn't find it. I think applying Michael's patch makes sense now, and if we decide to backpatch PostgresNode (and I get the time to do it) we can clobber that fix quite happily with the full backport. Thanks Michael for the workaround. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services