Обсуждение: intermittent test failure on Windows
Bowerbird (Visual Studio 2017 / Windows 10 pro) just had a failure on the pg_ctl test : <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-10-21%2011%3A50%3A21> There was a similar failure 17 days ago. I surmise that what's happening here is that the test is trying to read current_logfiles while the server is writing it, so there's a race condition. Perhaps what we need to do is have slurp_file sleep a bit and try again on Windows if it gets EPERM, or else we need to have the pg_ctl test wait a bit before calling slurp_file. But we have seen occasional similar failures on other tests in Windows so a more systemic approach might be better. Thoughts? cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > Bowerbird (Visual Studio 2017 / Windows 10 pro) just had a failure on > the pg_ctl test : > <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-10-21%2011%3A50%3A21> > I surmise that what's happening here is that the test is trying to read > current_logfiles while the server is writing it, so there's a race > condition. Hmm ... the server tries to replace current_logfiles atomically with rename(), so this says that rename isn't atomic on Windows, which we knew already. Previous discussion (cf. commit d611175e5) implies that an even worse failure condition is possible: the server might fail to rename current_logfiles.tmp into place, just because somebody is trying to read current_logfiles. Ugh. I found a thread about trying to make a really bulletproof rename() for Windows: https://www.postgresql.org/message-id/flat/CAPpHfds7dyuGZt%2BPF2GL9qSSVV0OZnjNwqiCPjN7mirDw882tA%40mail.gmail.com but it looks like we gave up in disgust. regards, tom lane
On 10/21/19 2:58 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> Bowerbird (Visual Studio 2017 / Windows 10 pro) just had a failure on >> the pg_ctl test : >> <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-10-21%2011%3A50%3A21> >> I surmise that what's happening here is that the test is trying to read >> current_logfiles while the server is writing it, so there's a race >> condition. > Hmm ... the server tries to replace current_logfiles atomically > with rename(), so this says that rename isn't atomic on Windows, > which we knew already. Previous discussion (cf. commit d611175e5) > implies that an even worse failure condition is possible: the server > might fail to rename current_logfiles.tmp into place, just because > somebody is trying to read current_logfiles. Ugh. > > I found a thread about trying to make a really bulletproof rename() > for Windows: > > https://www.postgresql.org/message-id/flat/CAPpHfds7dyuGZt%2BPF2GL9qSSVV0OZnjNwqiCPjN7mirDw882tA%40mail.gmail.com > > but it looks like we gave up in disgust. Yeah. Looks like Alexander revived the discussion with a patch back in August, though, and it's in the next commitfest. <https://commitfest.postgresql.org/25/2230/> cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services