Re: Windows buildfarm members vs. new async-notify isolation test
От | Mark Dilger |
---|---|
Тема | Re: Windows buildfarm members vs. new async-notify isolation test |
Дата | |
Msg-id | 362bca0b-1c1c-c760-ab19-b5d9a14c69ea@gmail.com обсуждение исходный текст |
Ответ на | Re: Windows buildfarm members vs. new async-notify isolation test (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>) |
Ответы |
Re: Windows buildfarm members vs. new async-notify isolation test
|
Список | pgsql-hackers |
On 12/2/19 11:42 AM, Andrew Dunstan wrote: > > On 12/2/19 11:23 AM, Tom Lane wrote: >> I see from the buildfarm status page that since commits 6b802cfc7 >> et al went in a week ago, frogmouth and currawong have failed that >> new test case every time, with the symptom >> >> ================== pgsql.build/src/test/isolation/regression.diffs =================== >> *** c:/prog/bf/root/REL_10_STABLE/pgsql.build/src/test/isolation/expected/async-notify.out Mon Nov 25 00:30:49 2019 >> --- c:/prog/bf/root/REL_10_STABLE/pgsql.build/src/test/isolation/results/async-notify.out Mon Dec 2 00:54:26 2019 >> *************** >> *** 93,99 **** >> step llisten: LISTEN c1; LISTEN c2; >> step lcommit: COMMIT; >> step l2commit: COMMIT; >> - listener2: NOTIFY "c1" with payload "" from notifier >> step l2stop: UNLISTEN *; >> >> starting permutation: llisten lbegin usage bignotify usage >> --- 93,98 ---- >> >> (Note that these two critters don't run branches v11 and up, which >> is why they're only showing this failure in 10 and 9.6.) >> >> drongo showed the same failure once in v10, and fairywren showed >> it once in v12. Every other buildfarm animal seems happy. >> >> I'm a little baffled as to what this might be --- some sort of >> timing problem in our Windows signal emulation, perhaps? But >> if so, why haven't we found it years ago? >> >> I don't have any ability to test this myself, so would appreciate >> help or ideas. > > > > I can test things, but I don't really know what to test. FYI frogmouth > and currawong run on virtualized XP. drongo anf fairywrne run on > virtualized WS2019. Neither VM is heavily resourced. Hi Andrew, if you have time you could perhaps check the isolation test structure itself. Like Tom, I don't have a Windows box to test this. I would be curious to see if there is a race condition in src/test/isolation/isolationtester.c between the loop starting on line 820: while ((res = PQgetResult(conn))) { ... } and the attempt to consume input that might include NOTIFY messages on line 861: PQconsumeInput(conn); If the first loop consumes the commit message, gets no further PGresult from PQgetResult, and finishes, and execution proceeds to PQconsumeInput before the NOTIFY has arrived over the socket, there won't be anything for PQnotifies to return, and hence for try_complete_step to print before returning. I'm not sure if it is possible for the commit message to arrive before the notify message in the fashion I am describing, but that's something you might easily check by having isolationtester sleep before PQconsumeInput on line 861. -- Mark Dilger
В списке pgsql-hackers по дате отправления: