Обсуждение: Sometimes the output to the stdout in Windows disappears

Поиск
Список
Период
Сортировка

Sometimes the output to the stdout in Windows disappears

От
Victor Spirin
Дата:
Hi!

Sometimes the output to stdout on Windows on multicore machines does not 
go through after connecting and disconnecting to the server using the 
PQconnectdbParams and PQfinish functions. I tested on 6 cores.

First we appeared this in some pgbench tests.  Then we found that this 
happens on any console program using PQconnectdbParams and PQfinish.

I am attaching a patch and a script for testing.

-- 
Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
Victor Spirin <v.spirin@postgrespro.ru> writes:
> Sometimes the output to stdout on Windows on multicore machines does not
> go through after connecting and disconnecting to the server using the
> PQconnectdbParams and PQfinish functions. I tested on 6 cores.

Hm, why is this not Microsoft's bug to solve?

I do wonder if this report is related to the intermittent ecpg failures
we see on Windows machines, such as [1].  The details vary, but it's
always a case of a .stdout file ending up empty when it should not be.
I'd supposed though that it must be something specific to ecpg, since
we never see anything like that anywhere but the ecpg tests.  Even if
you posit that libpq is doing something that somehow compromises stdio,
that should affect psql-based tests too.

> I am attaching a patch and a script for testing.
> [ forced fflush in every snprintf call ]

My goodness, that's a large hammer you're swinging.  What effects has this
kluge got on performance?

While I think you may be on to something, this seems like a truly horrid
way to "fix" it.  We need to dig down further and understand what is
actually happening.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-08-13%2022%3A15%3A05



Re: Sometimes the output to the stdout in Windows disappears

От
Victor Spirin
Дата:
I only found that calling WSACleanup instead of PQfinish has the same 
effect.

I don't see any big performance issues due to the extra fflush call in 
this place. I will be glad to find a more elegant solution.

Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

31.08.2020 21:31, Tom Lane пишет:
> Victor Spirin <v.spirin@postgrespro.ru> writes:
>> Sometimes the output to stdout on Windows on multicore machines does not
>> go through after connecting and disconnecting to the server using the
>> PQconnectdbParams and PQfinish functions. I tested on 6 cores.
> Hm, why is this not Microsoft's bug to solve?
>
> I do wonder if this report is related to the intermittent ecpg failures
> we see on Windows machines, such as [1].  The details vary, but it's
> always a case of a .stdout file ending up empty when it should not be.
> I'd supposed though that it must be something specific to ecpg, since
> we never see anything like that anywhere but the ecpg tests.  Even if
> you posit that libpq is doing something that somehow compromises stdio,
> that should affect psql-based tests too.
>
>> I am attaching a patch and a script for testing.
>> [ forced fflush in every snprintf call ]
> My goodness, that's a large hammer you're swinging.  What effects has this
> kluge got on performance?
>
> While I think you may be on to something, this seems like a truly horrid
> way to "fix" it.  We need to dig down further and understand what is
> actually happening.
>
>             regards, tom lane
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-08-13%2022%3A15%3A05
>
>



Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
I happened to try googling for other similar reports, and I found
a very interesting recent thread here:

https://github.com/nodejs/node/issues/33166

It might not have the same underlying cause, of course, but it sure
sounds familiar.  If Node.js are really seeing the same effect,
that would point to an underlying Windows bug rather than anything
Postgres is doing wrong.

It doesn't look like the Node.js crew got any closer to
understanding the issue than we have, unfortunately.  They made
their problem mostly go away by reverting a seemingly-unrelated
patch.  But I can't help thinking that it's a timing-related bug,
and that patch was just unlucky enough to change the timing of
their tests so that they saw the failure frequently.

            regards, tom lane



Re: Sometimes the output to the stdout in Windows disappears

От
Alexander Lakhin
Дата:
Hello hackers,

13.09.2020 21:37, Tom Lane wrote:
> I happened to try googling for other similar reports, and I found
> a very interesting recent thread here:
>
> https://github.com/nodejs/node/issues/33166
>
> It might not have the same underlying cause, of course, but it sure
> sounds familiar.  If Node.js are really seeing the same effect,
> that would point to an underlying Windows bug rather than anything
> Postgres is doing wrong.
>
> It doesn't look like the Node.js crew got any closer to
> understanding the issue than we have, unfortunately.  They made
> their problem mostly go away by reverting a seemingly-unrelated
> patch.  But I can't help thinking that it's a timing-related bug,
> and that patch was just unlucky enough to change the timing of
> their tests so that they saw the failure frequently.
I've managed to make a simple reproducer. Please look at the patch attached.
There are two things crucial for reproducing the bug:
    ioctlsocket(sock, FIONBIO, &ioctlsocket_ret); // from pgwin32_socket()
and
    WSACleanup();

I still can't understand what affects the effect. With this reproducer I
get:
vcregress taptest src\test\modules\connect
...
t/000_connect.pl .. # test
#
t/000_connect.pl .. 13346/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 16714/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 26216/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 30077/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 36505/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 43647/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 53070/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 54402/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 55685/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 83193/100000
#   Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 99992/100000 # Looks like you failed 10 tests of 100000.
t/000_connect.pl .. Dubious, test returned 10 (wstat 2560, 0xa00)
Failed 10/100000 subtests

But in our test farm the pg_bench test (from the installcheck-world
suite that we run with using msys) can fail roughly on each third run.
Perhaps it depends on I/O load. It seems, that searching files/scanning
disk in parallel increases the probability of the glitch.
I see no solution for this on the postgres side for now, but this
information about Windows quirks could be useful in case someone
stumbled upon it too.

Best regards,
Alexander

Вложения

Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> I've managed to make a simple reproducer. Please look at the patch attached.
> There are two things crucial for reproducing the bug:
>     ioctlsocket(sock, FIONBIO, &ioctlsocket_ret); // from pgwin32_socket()
> and
>     WSACleanup();

Oh, very interesting.

Now that you have it somewhat in captivity, maybe you could determine
some things:

1. Is it only stdout that's affected?  What of other stdio streams?
(Note that testing stderr might be tricky because it's probably
line-buffered.)

2. Does an fflush() just before, or just after, WSACleanup() fix it?

> I see no solution for this on the postgres side for now, but this
> information about Windows quirks could be useful in case someone
> stumbled upon it too.

Depending on your answers to the above, maybe some hack like this
would be acceptable:

     free(conn);

 #ifdef WIN32
+    fflush(NULL);
     WSACleanup();
 #endif
 }

It's not very nice for a library to be doing global things like that,
but if the alternative is loss of output, maybe we should.

But wait a minute: I just looked at Microsoft's docs [1] and found

    In a multithreaded environment, WSACleanup terminates Windows Sockets
    operations for all threads.

This makes me (a) wonder if that explains the side-effects on stdio,
and (b) question why libpq is calling WSACleanup at all.
What if we arranged to call WSAStartup just once, during the first
libpq connection-open in a process, and then never did WSACleanup?
Surely the OS can cope with that, and it eliminates any risk that
WSACleanup breaks something.

            regards, tom lane

[1] https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-wsacleanup



Re: Sometimes the output to the stdout in Windows disappears

От
Alexander Lakhin
Дата:
16.10.2020 19:18, Tom Lane wrote:
Oh, very interesting.
Now that you have it somewhat in captivity, maybe you could determine
some things:

1. Is it only stdout that's affected?  What of other stdio streams?
(Note that testing stderr might be tricky because it's probably
line-buffered.)
stderr is affected too. Just replacing stdout with stderr in connect.c and 000_connect.pl gives the same result.

Moreover, the following modification:
...
    outfile = fopen("out", "w");
...
    fprintf(stdout, "stdout test\n");
    fprintf(stderr, "stderr test\n");
    fprintf(outfile, "outfile test\n");
    WSACleanup();
...

---
for (my $i =0; $i < 100000; $i++) {
    unlink('out');
    IPC::Run::run(\@cmd, '>', \$stdout, '2>', \$stderr);
    open(my $fh, '<', 'out') or die $!;
    my $fileout = <$fh>;
    ok(defined $fileout && $fileout ne '');
    close($fh);
}

detects similar failures too. (On a fail the out file exists but has zero size.)

2. Does an fflush() just before, or just after, WSACleanup() fix it?
"fflush(NULL);" just before or after WSACleanup() fixes things.

I've managed to record in ProcMon the activity log for a failed run (aside normal runs). Excerpts from the log are attached. As we can see, the failed process doesn't even try to write into IPC-Run's temp file.

Depending on your answers to the above, maybe some hack like this
would be acceptable:
 	free(conn);  #ifdef WIN32
+	fflush(NULL); 	WSACleanup(); #endif }

It's not very nice for a library to be doing global things like that,
but if the alternative is loss of output, maybe we should.
But now we see that the WSACleanup call is a global thing by itself.
But wait a minute: I just looked at Microsoft's docs [1] and found
    In a multithreaded environment, WSACleanup terminates Windows Sockets    operations for all threads.

This makes me (a) wonder if that explains the side-effects on stdio,
and (b) question why libpq is calling WSACleanup at all.
What if we arranged to call WSAStartup just once, during the first
libpq connection-open in a process, and then never did WSACleanup?
Surely the OS can cope with that, and it eliminates any risk that
WSACleanup breaks something.
What bothers me is:
There must be a call to WSACleanup for each successful call to WSAStartup. Only the final WSACleanup function call performs the actual cleanup. The preceding calls simply decrement an internal reference count in the WS2_32.DLL.
So third-party application developers should understand that when using libpq they would have to call WSACleanup one more time to perform "the actual cleanup". (And thus WSAStartup is kind of like a global thing too.)
But may be it's a way better than to have a confirmed risk of losing data.

Best regards,
Alexander
Вложения

Re: Sometimes the output to the stdout in Windows disappears

От
Andrew Dunstan
Дата:
On 10/16/20 12:18 PM, Tom Lane wrote:
>
> But wait a minute: I just looked at Microsoft's docs [1] and found
>
>     In a multithreaded environment, WSACleanup terminates Windows Sockets
>     operations for all threads.
>
> This makes me (a) wonder if that explains the side-effects on stdio,
> and (b) question why libpq is calling WSACleanup at all.
> What if we arranged to call WSAStartup just once, during the first
> libpq connection-open in a process, and then never did WSACleanup?
> Surely the OS can cope with that, and it eliminates any risk that
> WSACleanup breaks something.
>
>             regards, tom lane
>
> [1] https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-wsacleanup
>
>


This could explain random transient stdout/stderr failures we have seen
over the years. I think we should at least give your suggestion a try -
this is a good time in the dev cycle for such experiments.


cheers


andrew






Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> 16.10.2020 19:18, Tom Lane wrote:
>> What if we arranged to call WSAStartup just once, during the first
>> libpq connection-open in a process, and then never did WSACleanup?
>> Surely the OS can cope with that, and it eliminates any risk that
>> WSACleanup breaks something.

> What bothers me is:

>     There must be a call to *WSACleanup* for each successful call to
>     WSAStartup
>     <https://docs.microsoft.com/en-us/windows/desktop/api/winsock/nf-winsock-wsastartup>.

Yeah, that is a very odd statement.  Surely, the Windows kernel manages
to cope if a program crashes without having done that.  So what exactly
is the downside of intentionally not doing it?  There's no reason to
care if the Winsock DLL stays loaded until program exit rather than
getting unloaded a bit earlier.  (If anything, the current code causes
an unload/reload cycle for each connection when the application makes
a series of PG connections; who could think that's a great idea?)

            regards, tom lane



Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
I wrote:
> Alexander Lakhin <exclusion@gmail.com> writes:
>> What bothers me is:
>>     There must be a call to *WSACleanup* for each successful call to
>>     WSAStartup
>>     <https://docs.microsoft.com/en-us/windows/desktop/api/winsock/nf-winsock-wsastartup>.

> Yeah, that is a very odd statement.  Surely, the Windows kernel manages
> to cope if a program crashes without having done that.  So what exactly
> is the downside of intentionally not doing it?

A bit of grepping showed me that the backend, initdb, and pg_regress
all call WSAStartup without ever doing WSACleanup, and we've seen no
ill effects from that.  So it seems clear that this documentation can
safely be ignored.

I propose the attached patch.  If this doesn't cause buildfarm problems,
perhaps we should back-patch it.

BTW, I notice that libpq is asking WSAStartup for Winsock version 1.1,
which is remarkably ancient.  Almost everyplace else is asking for
version 2.2, which has been current for a decade or two.  Shouldn't
we update that?  (It occurs to me to wonder if this in itself is
some kind of problem; I wonder how well Winsock works when there are
requests for different API versions in the same program.)

            regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3315f1dd05..de60281fcb 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -91,21 +91,6 @@
     </para>
    </warning>

-   <note>
-    <para>
-     On Windows, there is a way to improve performance if a single
-     database connection is repeatedly started and shutdown.  Internally,
-     libpq calls <function>WSAStartup()</function> and <function>WSACleanup()</function> for connection startup
-     and shutdown, respectively.  <function>WSAStartup()</function> increments an internal
-     Windows library reference count which is decremented by <function>WSACleanup()</function>.
-     When the reference count is just one, calling <function>WSACleanup()</function> frees
-     all resources and all DLLs are unloaded.  This is an expensive
-     operation.  To avoid this, an application can manually call
-     <function>WSAStartup()</function> so resources will not be freed when the last database
-     connection is closed.
-    </para>
-   </note>
-
    <variablelist>
     <varlistentry id="libpq-PQconnectdbParams">
      <term><function>PQconnectdbParams</function><indexterm><primary>PQconnectdbParams</primary></indexterm></term>
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index a967e11378..b51cc76c7d 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -229,19 +229,6 @@ static char *readMessageFromPipe(int fd);
     (strncmp(msg, prefix, strlen(prefix)) == 0)


-/*
- * Shutdown callback to clean up socket access
- */
-#ifdef WIN32
-static void
-shutdown_parallel_dump_utils(int code, void *unused)
-{
-    /* Call the cleanup function only from the main thread */
-    if (mainThreadId == GetCurrentThreadId())
-        WSACleanup();
-}
-#endif
-
 /*
  * Initialize parallel dump support --- should be called early in process
  * startup.  (Currently, this is called whether or not we intend parallel
@@ -267,8 +254,7 @@ init_parallel_dump_utils(void)
             pg_log_error("WSAStartup failed: %d", err);
             exit_nicely(1);
         }
-        /* ... and arrange to shut it down at exit */
-        on_exit_nicely(shutdown_parallel_dump_utils, NULL);
+
         parallel_init_done = true;
     }
 #endif
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index af27fee6b5..704c9e2f79 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3871,23 +3871,30 @@ makeEmptyPGconn(void)
 #ifdef WIN32

     /*
-     * Make sure socket support is up and running.
+     * Make sure socket support is up and running in this process.
+     *
+     * Note: the Windows documentation says that we should eventually do a
+     * matching WSACleanup() call, but experience suggests that that is at
+     * least as likely to cause problems as fix them.  So we don't.
      */
-    WSADATA        wsaData;
+    static bool wsastartup_done = false;

-    if (WSAStartup(MAKEWORD(1, 1), &wsaData))
-        return NULL;
+    if (!wsastartup_done)
+    {
+        WSADATA        wsaData;
+
+        if (WSAStartup(MAKEWORD(1, 1), &wsaData) != 0)
+            return NULL;
+        wsastartup_done = true;
+    }
+
+    /* Forget any earlier error */
     WSASetLastError(0);
-#endif
+#endif                            /* WIN32 */

     conn = (PGconn *) malloc(sizeof(PGconn));
     if (conn == NULL)
-    {
-#ifdef WIN32
-        WSACleanup();
-#endif
         return conn;
-    }

     /* Zero all pointers and booleans */
     MemSet(conn, 0, sizeof(PGconn));
@@ -4080,10 +4087,6 @@ freePGconn(PGconn *conn)
     termPQExpBuffer(&conn->workBuffer);

     free(conn);
-
-#ifdef WIN32
-    WSACleanup();
-#endif
 }

 /*

Re: Sometimes the output to the stdout in Windows disappears

От
Alexander Lakhin
Дата:
17.10.2020 21:44, Tom Lane wrote:
> I propose the attached patch. If this doesn't cause buildfarm problems,
> perhaps we should back-patch it.
Thank you!
I've made a simple cmd script to reproduce problems seen on dory:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=dory&br=HEAD

FOR /L %%I IN (1,1,200) DO call :CHECK %%I
GOTO :eof
:CHECK
echo iteration %1
call vcregress ecpgcheck
IF %ERRORLEVEL% NEQ 0 GOTO ERR
EXIT /B
:ERR
echo iteration %1 failed
pause

Without the fix I've got errors on iterations 43, 46, 128, 47, 14, 4,
27, which approximately corresponds to the ECPG-Check failure frequency
on dory (for HEAD).
With the fix all the 200 iterations passed as expected.
Then I ran the loop again just to be sure and got:
test thread/descriptor            ... stderr FAILED       81 ms
iteration 124 failed.

diff -w -U3
.../src/interfaces/ecpg/test/expected/thread-descriptor.stderr
.../src/interfaces/ecpg/test/results/thread-descriptor.stderr
--- .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr   
2019-12-04 16:05:46 +0300
+++ .../src/interfaces/ecpg/test/results/thread-descriptor.stderr   
2020-10-18 20:20:27 +0300
@@ -0,0 +1 @@
+SQL error: descriptor "mydesc" not found on line 31

It's interesting that all failures before the fix were with stdout, but
this one is with stderr.
I'm going to investigate this issue further.

Best regards,
Alexander



Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> With the fix all the 200 iterations passed as expected.
> Then I ran the loop again just to be sure and got:
> test thread/descriptor            ... stderr FAILED       81 ms
> iteration 124 failed.

Sigh ... still, this:

> diff -w -U3
> .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr
> .../src/interfaces/ecpg/test/results/thread-descriptor.stderr
> --- .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr   
> 2019-12-04 16:05:46 +0300
> +++ .../src/interfaces/ecpg/test/results/thread-descriptor.stderr   
> 2020-10-18 20:20:27 +0300
> @@ -0,0 +1 @@
> +SQL error: descriptor "mydesc" not found on line 31

does not look like the same kind of failure as what we've been dealing
with up to now.  So maybe what we've got is that we fixed the stdio
loss problem, and now the error rate is down to the point where we can
notice other, even-lower-probability issues.

            regards, tom lane



Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
I wrote:
>> diff -w -U3
>> .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr
>> .../src/interfaces/ecpg/test/results/thread-descriptor.stderr
>> --- .../src/interfaces/ecpg/test/expected/thread-descriptor.stderr   
>> 2019-12-04 16:05:46 +0300
>> +++ .../src/interfaces/ecpg/test/results/thread-descriptor.stderr   
>> 2020-10-18 20:20:27 +0300
>> @@ -0,0 +1 @@
>> +SQL error: descriptor "mydesc" not found on line 31

> does not look like the same kind of failure as what we've been dealing
> with up to now.  So maybe what we've got is that we fixed the stdio
> loss problem, and now the error rate is down to the point where we can
> notice other, even-lower-probability issues.

Yeah, I think so.  I grepped the buildfarm logs for similar failures and
found three occurrences:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-02-03%2018%3A36%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2019-01-17%2014%3A30%3A07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2020-01-02%2018%3A03%3A52

All of these are in the thread/descriptor test, failing at the deallocate
step in

    for (i = 1; i <= REPEATS; ++i)
    {
        EXEC SQL ALLOCATE DESCRIPTOR mydesc;
        EXEC SQL DEALLOCATE DESCRIPTOR mydesc;
    }

where the test is running several of these in different threads.
I wonder whether there's some missing thread-locking in the ECPG
descriptor support.  It is odd though that we have seen this only
on Windows members.  Low-probability or not, you'd think we'd have
some similar reports from non-Windows critters if it were possible.

            regards, tom lane



Re: Sometimes the output to the stdout in Windows disappears

От
Alexander Lakhin
Дата:
18.10.2020 21:04, Tom Lane wrote:
> Alexander Lakhin <exclusion@gmail.com> writes:
>> @@ -0,0 +1 @@
>> +SQL error: descriptor "mydesc" not found on line 31
> does not look like the same kind of failure as what we've been dealing
> with up to now.  So maybe what we've got is that we fixed the stdio
> loss problem, and now the error rate is down to the point where we can
> notice other, even-lower-probability issues.
Yes, in this case stderr is not missing (it's present with the error).
So it's really different case. As is another one:

test connect/test5                ... stderr FAILED      238 ms

diff -w -U3 .../src/interfaces/ecpg/test/expected/connect-test5.stderr
.../src/interfaces/ecpg/test/results/connect-test5.stderr
--- .../src/interfaces/ecpg/test/expected/connect-test5.stderr   
2020-10-13 21:51:14 +0300
+++ .../src/interfaces/ecpg/test/results/connect-test5.stderr   
2020-10-18 20:59:46 +0300
@@ -73,7 +73,9 @@
 [NO_PID]: sqlca: code: -220, state: 08003
 [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port
<DEFAULT>  for user regress_ecpg_user2
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ECPGconnect: could not open database: FATAL:  database
"regress_ecpg_user2" does not exist
+[NO_PID]: ECPGconnect: could not open database: server closed the
connection unexpectedly
+    This probably means the server terminated abnormally
+    before or while processing the request.

and the server.log:
2020-10-18 20:59:45.731 MSK client backend[1380] ecpg/connect-test4
LOG:  could not receive data from client: An existing connection was
forcibly closed by the remote host.
   
2020-10-18 20:59:45.898 MSK client backend[2884] [unknown] FATAL: 
database "regress_ecpg_user2" does not exist
2020-10-18 20:59:45.992 MSK client backend[1640] [unknown] FATAL: 
database "regress_ecpg_user2" does not exist

I just wanted to inform that the ECPG-test failures can still persist in
the buildfarm, unfortunately.

Best regards,
Alexander



Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> I've made a simple cmd script to reproduce problems seen on dory:
> ...
> Without the fix I've got errors on iterations 43, 46, 128, 47, 14, 4,
> 27, which approximately corresponds to the ECPG-Check failure frequency
> on dory (for HEAD).
> With the fix all the 200 iterations passed as expected.
> Then I ran the loop again just to be sure and got:
> test thread/descriptor            ... stderr FAILED       81 ms
> iteration 124 failed.

I had been thinking we'd have to wait a month or two for the buildfarm
to accumulate enough runs to be confident in whether the WSACleanup
removal fixes the ecpg failures.  However, now that you did this
experiment, I think we have enough evidence already that it fixes it
(or at least makes things an order of magnitude better).

So now I'm inclined to not wait, but go ahead and backpatch 7d00a6b2d
now.  There's still enough time before the November releases that we
can expect that any nasty problems will show up in the buildfarm
before we ship.

            regards, tom lane



Re: Sometimes the output to the stdout in Windows disappears

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> I just wanted to inform that the ECPG-test failures can still persist in
> the buildfarm, unfortunately.

Right, but at least now we can see that there are other issues to
investigate.  Personally I stopped paying any attention to buildfarm
ECPG failures on Windows some time ago, figuring they were all the
mysterious stdout-truncation problem.  With that gone there'll be
less noise and more signal in the buildfarm results.

            regards, tom lane