Обсуждение: Re: BackgroundPsql swallowing errors on windows
On 2025-02-13 Th 12:39 PM, Andres Freund wrote: > Hi, > > One of the remaining tasks for AIO was to convert the tests to be proper tap > tests. I did that and was thanked by the tests fairly randomly failing on > windows. Which test fails changes from run to run. > > The symptom is that BackgroundPsql->query() sometimes would simply swallow > errors that were clearly generated by the backend. Which then would cause > tests to fail, because testing for errors was the whole point of $test. > > > At first I thought the issue was that psql didn't actually flush stderr after > displaying errors. And while that may be an issue, it doesn't seem to be the > one causing problems for me. > > Lots of hair-pulling later, I have a somewhat confirmed theory for what's > happening: > > BackgroundPsql::query() tries to detect if the passed in query has executed by > adding a "banner" after the query and using pump_until() to wait until that > banner has been "reached". That seems to work reasonably well on !windows. > > On windows however, it looks like there's no guarantee that if stdout has been > received by IPC::Run, stderr also has been received, even if the stderr > content has been generated first. I tried to add an extra ->pump_nb() call to > query(), thinking that maybe IPC::Run just didn't get input that had actually > arrived, due to waiting on just one pipe. But no success. > > My understanding is that IPC::Run uses a proxy process on windows to execute > subprocesses and then communicates with that over TCP (or something along > those lines). I suspect what's happening is that the communication with the > external process allows for reordering between stdout/stderr. > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > and stderr and waiting on both seems to fix the issue. > > > One complication is that I found that just waiting for the banner, without > also its newline, sometimes lead to unexpected newlines causing later queries > to fail. I think that happens if the trailing newline is read separately from > the rest of the string. > > However, matching the newlines caused tests to fail on some machines. After a > lot of cursing I figured out that for interactive psql we output \r\n, causing > my regex match to fail. I.e. tests failed whenever IO::PTY was availble... > > It's also not correct, as we did before, to just look for the banner, it has > to be anchored to either the start of the output or a newline, otherwise the > \echo (or \warn) command itself will be matched by pump_until() (but then the > replacing the command would fail). Not sure that could cause active problems > without the addition of \warn (which is also echoed on stdout), but it > certainly could after. > > > The banner being the same between queries made it hard to understand if a > banner that appeared in the output was from the current query or a past > query. Therefore I added a counter to it. > > > For debugging I added a "note" that shows stdout/stderr after executing the > query, I think it may be worth keeping that, but I'm not sure. > > > This was a rather painful exercise. > > It's been discussed before, but I'd really really like to get rid of BackgroundPsql. It's ugly, non-intuitive and fragile. Last year I did some work on this. I was going to present it at Athens but illness prevented me, and then other life events managed to get in my way. But the basic work is around. See <https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> This introduces a libpq session object (PostgreSQL::Test::Session) which can be backed either by FFI or a small XS wrapper - the commit has recipes for both. Missing is a meson.build file for the XS wrapper. There are significant performance gains to be had too (poll_query_until is much nicer, for example, as are most uses of safe_psql). If there is interest I will bring the work up to date, and maybe we can introduce this early in the v19 cycle. It would significantly reduce our dependency on IPC::Run, especially the pump() stuff. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote: > It's been discussed before, but I'd really really like to get rid of > BackgroundPsql. It's ugly, non-intuitive and fragile. I agree, unfortunately we're stuck with this until we have a better alternative in tree :( > Last year I did some work on this. I was going to present it at Athens but > illness prevented me, and then other life events managed to get in my way. > But the basic work is around. See <https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> > This introduces a libpq session object (PostgreSQL::Test::Session) which can > be backed either by FFI or a small XS wrapper - the commit has recipes for > both. Missing is a meson.build file for the XS wrapper. There are > significant performance gains to be had too (poll_query_until is much nicer, > for example, as are most uses of safe_psql). If there is interest I will > bring the work up to date, and maybe we can introduce this early in the v19 > cycle. It would significantly reduce our dependency on IPC::Run, especially > the pump() stuff. I definitely am interested. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
>> ... If there is interest I will
>> bring the work up to date, and maybe we can introduce this early in the v19
>> cycle. It would significantly reduce our dependency on IPC::Run, especially
>> the pump() stuff.
> I definitely am interested.
+1. Would there be a chance of removing our use of IPC::Run entirely?
There'd be a lot to like about that.
regards, tom lane
Hi, On 2025-02-14 12:14:55 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote: > >> ... If there is interest I will > >> bring the work up to date, and maybe we can introduce this early in the v19 > >> cycle. It would significantly reduce our dependency on IPC::Run, especially > >> the pump() stuff. > > > I definitely am interested. > > +1. Would there be a chance of removing our use of IPC::Run entirely? > There'd be a lot to like about that. Unfortunately I doubt it'd get us that close to that. We do have several tests that intentionally involve psql, e.g. 010_tab_completion.pl, 001_password.pl. Presumably those would continue to use something like BackgroundPsql.pm, even if we had a sane way to have longrunning connections in tap tests. There also are a bunch of things we use IPC::Run to run synchronously, we probably could replace most of those without too much pain. The hardest probably would be timeout handling. Greetings, Andres Freund
On 2025-02-14 Fr 11:54 AM, Andres Freund wrote: > Hi, > > On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote: >> It's been discussed before, but I'd really really like to get rid of >> BackgroundPsql. It's ugly, non-intuitive and fragile. > I agree, unfortunately we're stuck with this until we have a better > alternative in tree :( > > >> Last year I did some work on this. I was going to present it at Athens but >> illness prevented me, and then other life events managed to get in my way. >> But the basic work is around. See <https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> >> This introduces a libpq session object (PostgreSQL::Test::Session) which can >> be backed either by FFI or a small XS wrapper - the commit has recipes for >> both. Missing is a meson.build file for the XS wrapper. There are >> significant performance gains to be had too (poll_query_until is much nicer, >> for example, as are most uses of safe_psql). If there is interest I will >> bring the work up to date, and maybe we can introduce this early in the v19 >> cycle. It would significantly reduce our dependency on IPC::Run, especially >> the pump() stuff. > I definitely am interested. Here we are almost exactly a year later. I returned to this work quite recently, and reached a milestone, namely the removal of all calls to BackgroundPsql, with all the TAP tests passing. The XS module still has some problems, and I think I'm inclined not to pursue it, and just rely on the FFI mapping. The current state of this is attached. People who are interested can hit me up for info in Vancouver if not before. I'll work on turning this into a set of commitable patches. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Вложения
Hi, On 2026-02-16 16:38:02 -0500, Andrew Dunstan wrote: > Here we are almost exactly a year later. I returned to this work quite > recently, and reached a milestone, namely the removal of all calls to > BackgroundPsql, with all the TAP tests passing. The XS module still has some > problems, and I think I'm inclined not to pursue it, and just rely on the > FFI mapping. > > The current state of this is attached. People who are interested can hit me > up for info in Vancouver if not before. I'll work on turning this into a set > of commitable patches. Nice progress! I briefly tried this out. The overall resource usage of the test is noticeably reduced - and that's on linux with fast fork, so it should be considerably better on windows. However, the tests take a lot longer than before, I think mostly due to polling for results rather than waiting for them to be ready using PQsocketPoll() or such. E.g. bloom/001_wal takes about 15s on HEAD for me, but 138s with the patch. I think that's just due to the various usleep(100_000); FWIW, oauth_validator/001_server fails with the patch at the moment. Greetings, Andres Freund
On 2026-02-16 Mo 7:17 PM, Andres Freund wrote: > Hi, > > On 2026-02-16 16:38:02 -0500, Andrew Dunstan wrote: >> Here we are almost exactly a year later. I returned to this work quite >> recently, and reached a milestone, namely the removal of all calls to >> BackgroundPsql, with all the TAP tests passing. The XS module still has some >> problems, and I think I'm inclined not to pursue it, and just rely on the >> FFI mapping. >> >> The current state of this is attached. People who are interested can hit me >> up for info in Vancouver if not before. I'll work on turning this into a set >> of commitable patches. > Nice progress! > > I briefly tried this out. The overall resource usage of the test is noticeably > reduced - and that's on linux with fast fork, so it should be considerably > better on windows. However, the tests take a lot longer than before, I think > mostly due to polling for results rather than waiting for them to be ready > using PQsocketPoll() or such. > > E.g. bloom/001_wal takes about 15s on HEAD for me, but 138s with the patch. I > think that's just due to the various usleep(100_000); > > > FWIW, oauth_validator/001_server fails with the patch at the moment. > Try this version. On my machine it's now a few percent faster. I fixed the polling. I also added pipeline support for large sets of commands, to minimize roundtrips. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Вложения
Hi, On 2026-02-17 16:31:02 -0500, Andrew Dunstan wrote: > On 2026-02-16 Mo 7:17 PM, Andres Freund wrote: > > I briefly tried this out. The overall resource usage of the test is noticeably > > reduced - and that's on linux with fast fork, so it should be considerably > > better on windows. However, the tests take a lot longer than before, I think > > mostly due to polling for results rather than waiting for them to be ready > > using PQsocketPoll() or such. > > > > E.g. bloom/001_wal takes about 15s on HEAD for me, but 138s with the patch. I > > think that's just due to the various usleep(100_000); > > > > > > FWIW, oauth_validator/001_server fails with the patch at the moment. > > > Try this version. On my machine it's now a few percent faster. I fixed the > polling. I also added pipeline support for large sets of commands, to > minimize roundtrips. Nice! Will try it out. Have you tried it on windows already? That's where we pay by far the biggest price due to all the unnecessary process creations... It looks like strawberry perl has FFI::Platypus, but not FFI::C. There is perl/vendor/lib/FFI/Platypus/Lang/C.pm, but that just seems like it's documentation. There is however FFI::Platypus::Record, which maybe could suffice? Do we actually need FFI::C, or can we work around not having it? Looks like it's just used for notify related stuff. It looks like mingw doesn't have packages for FFI::Platypus, but it'll probably be a lot easier to build that than when using msvc. Greetings, Andres Freund
On 2026-02-17 Tu 4:56 PM, Andres Freund wrote: > Hi, > > On 2026-02-17 16:31:02 -0500, Andrew Dunstan wrote: >> On 2026-02-16 Mo 7:17 PM, Andres Freund wrote: >>> I briefly tried this out. The overall resource usage of the test is noticeably >>> reduced - and that's on linux with fast fork, so it should be considerably >>> better on windows. However, the tests take a lot longer than before, I think >>> mostly due to polling for results rather than waiting for them to be ready >>> using PQsocketPoll() or such. >>> >>> E.g. bloom/001_wal takes about 15s on HEAD for me, but 138s with the patch. I >>> think that's just due to the various usleep(100_000); >>> >>> >>> FWIW, oauth_validator/001_server fails with the patch at the moment. >>> >> Try this version. On my machine it's now a few percent faster. I fixed the >> polling. I also added pipeline support for large sets of commands, to >> minimize roundtrips. > Nice! Will try it out. > > > Have you tried it on windows already? That's where we pay by far the biggest > price due to all the unnecessary process creations... > > It looks like strawberry perl has FFI::Platypus, but not FFI::C. There is > perl/vendor/lib/FFI/Platypus/Lang/C.pm, but that just seems like it's > documentation. There is however FFI::Platypus::Record, which maybe could > suffice? > > Do we actually need FFI::C, or can we work around not having it? Looks like > it's just used for notify related stuff. > > It looks like mingw doesn't have packages for FFI::Platypus, but it'll > probably be a lot easier to build that than when using msvc. > > I replaced the use of FFI::C with FFI::Platypus::Record. That comes for free with FFI::Platypus, so there would be no extra dependency. It means a little extra housekeeping so we don't lose track of the pointer for later use with PQfreemem, but it's not too bad. I have tried it out with Windows, seemed to work OK although the xid_wraparound tests 2 and 3 timed out. Latest is attached. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com