Обсуждение: MSVC SSL test failure

Поиск
Список
Период
Сортировка

MSVC SSL test failure

От
Andrew Dunstan
Дата:
I am getting this test failure 001_ssltests.pl on my test MSVC system
when SSL tests are enabled:

    not ok 110 - certificate authorization fails with revoked client cert with server-side CRL directory: matches

    #   Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory: matches'
    #   at t/001_ssltests.pl line 618.
    #                   'psql: error: connection to server at "127.0.0.1", port 59491 failed: server closed the
connectionunexpectedly
 
    #       This probably means the server terminated abnormally
    #       before or while processing the request.
    # server closed the connection unexpectedly
    #       This probably means the server terminated abnormally
    #       before or while processing the request.
    # server closed the connection unexpectedly
    #       This probably means the server terminated abnormally
    #       before or while processing the request.'
    #     doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'

There's nothing terribly suspicious in the server log, so I'm not quite
sure what's going on here.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: MSVC SSL test failure

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I am getting this test failure 001_ssltests.pl on my test MSVC system
> when SSL tests are enabled:

>     not ok 110 - certificate authorization fails with revoked client cert with server-side CRL directory: matches

>     #   Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory:
matches'
>     #   at t/001_ssltests.pl line 618.
>     #                   'psql: error: connection to server at "127.0.0.1", port 59491 failed: server closed the
connectionunexpectedly 
>     #       This probably means the server terminated abnormally
>     #       before or while processing the request.
>     # server closed the connection unexpectedly
>     #       This probably means the server terminated abnormally
>     #       before or while processing the request.
>     # server closed the connection unexpectedly
>     #       This probably means the server terminated abnormally
>     #       before or while processing the request.'
>     #     doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'

> There's nothing terribly suspicious in the server log, so I'm not quite
> sure what's going on here.

Hmm .. does enabling log_connections/log_disconnections produce
any useful info?

This looks quite a bit like the sort of failure that commit
6051857fc was meant to forestall.  I wonder whether reverting
that commit changes the results?  You might also try inserting
a shutdown() call, as we'd decided not to do [1].

            regards, tom lane

[1] https://www.postgresql.org/message-id/1283317.1638213407%40sss.pgh.pa.us



Re: MSVC SSL test failure

От
Daniel Gustafsson
Дата:
> On 5 Dec 2021, at 18:03, Andrew Dunstan <andrew@dunslane.net> wrote:

> I am getting this test failure 001_ssltests.pl on my test MSVC system
> when SSL tests are enabled:
>
>    not ok 110 - certificate authorization fails with revoked client cert with server-side CRL directory: matches
>
>    #   Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory: matches'
>    #   at t/001_ssltests.pl line 618.
>    #                   'psql: error: connection to server at "127.0.0.1", port 59491 failed: server closed the
connectionunexpectedly 
>    #       This probably means the server terminated abnormally
>    #       before or while processing the request.
>    # server closed the connection unexpectedly
>    #       This probably means the server terminated abnormally
>    #       before or while processing the request.
>    # server closed the connection unexpectedly
>    #       This probably means the server terminated abnormally
>    #       before or while processing the request.'
>    #     doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
>
> There's nothing terribly suspicious in the server log, so I'm not quite
> sure what's going on here.

Trying out HEAD as of 20 minutes ago in Andres' MSVC setup on Cirrus CI I'm
unable to replicate this.  Did you remake the keys/certs or are they the files
from the repo?  Which version of OpenSSL is this using?

--
Daniel Gustafsson        https://vmware.com/




Re: MSVC SSL test failure

От
Andrew Dunstan
Дата:
On 12/5/21 15:14, Daniel Gustafsson wrote:
>> On 5 Dec 2021, at 18:03, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I am getting this test failure 001_ssltests.pl on my test MSVC system
>> when SSL tests are enabled:
>>
>>    not ok 110 - certificate authorization fails with revoked client cert with server-side CRL directory: matches
>>
>>    #   Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory:
matches'
>>    #   at t/001_ssltests.pl line 618.
>>    #                   'psql: error: connection to server at "127.0.0.1", port 59491 failed: server closed the
connectionunexpectedly
 
>>    #       This probably means the server terminated abnormally
>>    #       before or while processing the request.
>>    # server closed the connection unexpectedly
>>    #       This probably means the server terminated abnormally
>>    #       before or while processing the request.
>>    # server closed the connection unexpectedly
>>    #       This probably means the server terminated abnormally
>>    #       before or while processing the request.'
>>    #     doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
>>
>> There's nothing terribly suspicious in the server log, so I'm not quite
>> sure what's going on here.
> Trying out HEAD as of 20 minutes ago in Andres' MSVC setup on Cirrus CI I'm
> unable to replicate this.  Did you remake the keys/certs or are they the files
> from the repo?  Which version of OpenSSL is this using?
>

It's WS2019 Build 1809, VS2019, openssl 1.1.1 (via chocolatey).


Can you show me the cirrus.yml file you're using to test with? A URL ref
to the results would also help.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: MSVC SSL test failure

От
Daniel Gustafsson
Дата:
> On 5 Dec 2021, at 23:44, Andrew Dunstan <andrew@dunslane.net> wrote:

> Can you show me the cirrus.yml file you're using to test with?

I used the 0001 patch from this thread:

https://www.postgresql.org/message-id/20211101055720.7mzwtkhzxmorpxth%40alap3.anarazel.de

> A URL ref to the results would also help.

I can't vouch for how helpful it is, but below is the build info that exists:

https://cirrus-ci.com/task/5358152892809216

--
Daniel Gustafsson        https://vmware.com/




Re: MSVC SSL test failure

От
Andrew Dunstan
Дата:
On 12/5/21 12:50, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I am getting this test failure 001_ssltests.pl on my test MSVC system
>> when SSL tests are enabled:
>>     not ok 110 - certificate authorization fails with revoked client cert with server-side CRL directory: matches
>>     #   Failed test 'certificate authorization fails with revoked client cert with server-side CRL directory:
matches'
>>     #   at t/001_ssltests.pl line 618.
>>     #                   'psql: error: connection to server at "127.0.0.1", port 59491 failed: server closed the
connectionunexpectedly
 
>>     #       This probably means the server terminated abnormally
>>     #       before or while processing the request.
>>     # server closed the connection unexpectedly
>>     #       This probably means the server terminated abnormally
>>     #       before or while processing the request.
>>     # server closed the connection unexpectedly
>>     #       This probably means the server terminated abnormally
>>     #       before or while processing the request.'
>>     #     doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
>> There's nothing terribly suspicious in the server log, so I'm not quite
>> sure what's going on here.
> Hmm .. does enabling log_connections/log_disconnections produce
> any useful info?


No, that's already turned on (this test is run using a standard
buildfarm client).


>
> This looks quite a bit like the sort of failure that commit
> 6051857fc was meant to forestall.  I wonder whether reverting
> that commit changes the results?  You might also try inserting
> a shutdown() call, as we'd decided not to do [1].
>
>             regards, tom lane
>
> [1] https://www.postgresql.org/message-id/1283317.1638213407%40sss.pgh.pa.us


Commenting out the closesocket() worked.


I will look at shutdown().


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: MSVC SSL test failure

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 12/5/21 12:50, Tom Lane wrote:
>> This looks quite a bit like the sort of failure that commit
>> 6051857fc was meant to forestall.  I wonder whether reverting
>> that commit changes the results?  You might also try inserting
>> a shutdown() call, as we'd decided not to do [1].

> Commenting out the closesocket() worked.

Man, that's annoying.  Apparently OpenSSL is doing something to
screw up the shutdown sequence.  According to [1], the graceful
shutdown sequence will happen by default, but changing SO_LINGER
or SO_DONTLINGER can get you into abortive shutdown anyway.
Maybe they change one of those settings (why?)

            regards, tom lane

[1] https://docs.microsoft.com/en-us/windows/win32/winsock/graceful-shutdown-linger-options-and-socket-closure-2



Re: MSVC SSL test failure

От
Daniel Gustafsson
Дата:
> On 6 Dec 2021, at 07:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 12/5/21 12:50, Tom Lane wrote:
>>> This looks quite a bit like the sort of failure that commit
>>> 6051857fc was meant to forestall.  I wonder whether reverting
>>> that commit changes the results?  You might also try inserting
>>> a shutdown() call, as we'd decided not to do [1].
> 
>> Commenting out the closesocket() worked.
> 
> Man, that's annoying.  Apparently OpenSSL is doing something to
> screw up the shutdown sequence.  According to [1], the graceful
> shutdown sequence will happen by default, but changing SO_LINGER
> or SO_DONTLINGER can get you into abortive shutdown anyway.
> Maybe they change one of those settings (why?)

AFAICT they don't touch either, and nothing really sticks out looking at
setsockopt calls in either 1.1.1 or 3.0.0.

--
Daniel Gustafsson        https://vmware.com/




Re: MSVC SSL test failure

От
Andrew Dunstan
Дата:
On 12/6/21 01:02, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 12/5/21 12:50, Tom Lane wrote:
>>> This looks quite a bit like the sort of failure that commit
>>> 6051857fc was meant to forestall.  I wonder whether reverting
>>> that commit changes the results?  You might also try inserting
>>> a shutdown() call, as we'd decided not to do [1].
>> Commenting out the closesocket() worked.
> Man, that's annoying.  Apparently OpenSSL is doing something to
> screw up the shutdown sequence.  According to [1], the graceful
> shutdown sequence will happen by default, but changing SO_LINGER
> or SO_DONTLINGER can get you into abortive shutdown anyway.
> Maybe they change one of those settings (why?)
>
>             regards, tom lane
>
> [1] https://docs.microsoft.com/en-us/windows/win32/winsock/graceful-shutdown-linger-options-and-socket-closure-2



Yeah, quite annoying, especially because only some combinations of MSVC
runtime / openssl version seem to trigger the problem.


Adding a shutdown() before the closesocket() also fixes the issue.


<https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-shutdown>
says:


    To assure that all data is sent and received on a connected socket
    before it is closed, an application should use shutdown to close
    connection before calling closesocket.
    ...

    Note  The shutdown function does not block regardless of the
    SO_LINGER setting on the socket.


Since we're not expecting anything back from the client I don't think we
need any of the recv calls the recipes there suggest.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: MSVC SSL test failure

От
Daniel Gustafsson
Дата:
> On 6 Dec 2021, at 15:56, Andrew Dunstan <andrew@dunslane.net> wrote:

> Yeah, quite annoying, especially because only some combinations of MSVC
> runtime / openssl version seem to trigger the problem.
> 
> Adding a shutdown() before the closesocket() also fixes the issue.

If you have a patch you're testing I'm happy to run it through Cirrus in as
many combinations of MSVC and OpenSSL as I can muster there.

--
Daniel Gustafsson        https://vmware.com/




Re: MSVC SSL test failure

От
Alexander Lakhin
Дата:
Hello Andrew,
06.12.2021 17:56, Andrew Dunstan wrote:
> Yeah, quite annoying, especially because only some combinations of MSVC
> runtime / openssl version seem to trigger the problem.
>
>
> Adding a shutdown() before the closesocket() also fixes the issue.
>
Can you confirm that adding shutdown(MyProcPort->sock, SD_BOTH) fixes
the issue for many test runs?
I don't see the stable test passing here.
Without shutdown() the test failed on iterations 1, 5, 4, but with
shutdown() it failed too, on iterations 3, 1, 4.
Without close() and shutdown() the test passes 20 iterations.
(I'm still researching how openssl affects the shutdown sequence.)

Best regards,
Alexander



Re: MSVC SSL test failure

От
Andrew Dunstan
Дата:
On 12/6/21 10:30, Alexander Lakhin wrote:
> Hello Andrew,
> 06.12.2021 17:56, Andrew Dunstan wrote:
>> Yeah, quite annoying, especially because only some combinations of MSVC
>> runtime / openssl version seem to trigger the problem.
>>
>>
>> Adding a shutdown() before the closesocket() also fixes the issue.
>>
> Can you confirm that adding shutdown(MyProcPort->sock, SD_BOTH) fixes
> the issue for many test runs?
> I don't see the stable test passing here.
> Without shutdown() the test failed on iterations 1, 5, 4, but with
> shutdown() it failed too, on iterations 3, 1, 4.
> Without close() and shutdown() the test passes 20 iterations.
> (I'm still researching how openssl affects the shutdown sequence.)




I have been getting 100% failures on the SSL tests with closesocket()
alone, and 100% success over 10 tests with this:


diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 96ab37c7d0..5998c089b0 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -295,6 +295,7 @@ socket_close(int code, Datum arg)
         * Windows too.  But it's a lot more fragile than the other way.
         */
 #ifdef WIN32
+       shutdown(MyProcPort->sock, SD_SEND);
        closesocket(MyProcPort->sock);
 #endif


That said, your results are quite worrying.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: MSVC SSL test failure

От
Alexander Lakhin
Дата:
06.12.2021 23:51, Andrew Dunstan wrote:
> I have been getting 100% failures on the SSL tests with closesocket()
> alone, and 100% success over 10 tests with this:
>
>
> diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
> index 96ab37c7d0..5998c089b0 100644
> --- a/src/backend/libpq/pqcomm.c
> +++ b/src/backend/libpq/pqcomm.c
> @@ -295,6 +295,7 @@ socket_close(int code, Datum arg)
>          * Windows too.  But it's a lot more fragile than the other way.
>          */
>  #ifdef WIN32
> +       shutdown(MyProcPort->sock, SD_SEND);
>         closesocket(MyProcPort->sock);
>  #endif
>
>
> That said, your results are quite worrying.
My next results are following:
It seems that the test failure rate may depend on the specs/environment.
With close-only version, having limited CPU usage for my Windows VM to
20%, I've got failures on iterations 10, 2, 1.
With 100% CPU I've seen 20 successful runs, then fails on iterations 5,
2. clean&buid and then failed iterations 11, 6, 3.  (So maybe caching is
another factor.)

shutdown(MyProcPort->sock, SD_SEND) apparently fixes the issue, I've got
83 successful runs, but then iteration 84 unfortunately failed:
t/001_ssltests.pl .. 106/110
#   Failed test 'intermediate client certificate is missing: matches'
#   at t/001_ssltests.pl line 608.
#                   'psql: error: connection to server at "127.0.0.1",
port 63187 failed: could not receive data from server: Software caused
connection abort (0x00002745/10053)
# SSL SYSCALL error: Software caused connection abort (0x00002745/10053)
# could not send startup packet: No error (0x00000000/0)'
#     doesn't match '(?^:SSL error: tlsv1 alert unknown ca)'
# Looks like you failed 1 test of 110.
t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/110 subtests
        (less 2 skipped subtests: 107 okay)

It's not that one that we observed with close-only fix, but it still
worrying. And then exactly this fail occurred again, on iteration 8.

But "fortunately" I've got the same fail as before:
t/001_ssltests.pl .. 106/110
#   Failed test 'certificate authorization fails with revoked client
cert with server-side CRL directory: matches'
#   at t/001_ssltests.pl line 618.
#                   'psql: error: connection to server at "127.0.0.1",
port 59220 failed: server closed the connection unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.
# server closed the connection unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.
# server closed the connection unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.'
#     doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
# Looks like you failed 1 test of 110.
t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/110 subtests
        (less 2 skipped subtests: 107 okay)
on 145-th iteration of the test even without close() (I've tried to
check whether the aforementioned fail existed before the fix).

So probably we found a practical evidence of shutdown() importance we
missed before, but it's not the end.
There was some test instability even without the close() fix and it
remains with the shutdown(...SD_SEND).

By the way, while exploring openssl' behavior, I found that
SSL_shutdown() has it's own quirks (see [1], return value 0). Maybe now
we've encountered one of these.

Best regards,
Alexander

[1] https://www.openssl.org/docs/man3.0/man3/SSL_shutdown.html



Re: MSVC SSL test failure

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> It seems that the test failure rate may depend on the specs/environment.

No surprise there, since the issue is almost surely timing-dependent.

> shutdown(MyProcPort->sock, SD_SEND) apparently fixes the issue, I've got
> 83 successful runs, but then iteration 84 unfortunately failed:
> t/001_ssltests.pl .. 106/110
> #   Failed test 'intermediate client certificate is missing: matches'
> #   at t/001_ssltests.pl line 608.
> #                   'psql: error: connection to server at "127.0.0.1",
> port 63187 failed: could not receive data from server: Software caused
> connection abort (0x00002745/10053)
> # SSL SYSCALL error: Software caused connection abort (0x00002745/10053)
> # could not send startup packet: No error (0x00000000/0)'
> #     doesn't match '(?^:SSL error: tlsv1 alert unknown ca)'
> # Looks like you failed 1 test of 110.

Hmm.  I wonder whether using SD_BOTH behaves any differently.

            regards, tom lane



Re: MSVC SSL test failure

От
Alexander Lakhin
Дата:
Hello Tom,
07.12.2021 19:25, Tom Lane wrote:
> Hmm. I wonder whether using SD_BOTH behaves any differently. 
With shutdown(MyProcPort->sock, SD_BOTH) the test failed for me on
iterations 1, 2, 3, 16 (just as without shutdown() at all).
So shutdown with the SD_SEND flag definitely behaves much better (I've
seen over 200 successful iterations).
 
Best regards,
Alexander



Re: MSVC SSL test failure

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> 07.12.2021 19:25, Tom Lane wrote:
>> Hmm. I wonder whether using SD_BOTH behaves any differently. 

> With shutdown(MyProcPort->sock, SD_BOTH) the test failed for me on
> iterations 1, 2, 3, 16 (just as without shutdown() at all).
> So shutdown with the SD_SEND flag definitely behaves much better (I've
> seen over 200 successful iterations).

Fun.  Well, I'll put in shutdown with SD_SEND for the moment,
and we'll have to see whether we can improve further than that.
It does sound like we may be running into OpenSSL bugs/oddities,
not only kernel issues, so it may be impossible to do better
on our side.

            regards, tom lane



Re: MSVC SSL test failure

От
Andrew Dunstan
Дата:
On 12/7/21 13:22, Tom Lane wrote:
> Alexander Lakhin <exclusion@gmail.com> writes:
>> 07.12.2021 19:25, Tom Lane wrote:
>>> Hmm. I wonder whether using SD_BOTH behaves any differently. 
>> With shutdown(MyProcPort->sock, SD_BOTH) the test failed for me on
>> iterations 1, 2, 3, 16 (just as without shutdown() at all).
>> So shutdown with the SD_SEND flag definitely behaves much better (I've
>> seen over 200 successful iterations).
> Fun.  Well, I'll put in shutdown with SD_SEND for the moment,
> and we'll have to see whether we can improve further than that.
> It does sound like we may be running into OpenSSL bugs/oddities,
> not only kernel issues, so it may be impossible to do better
> on our side.


Yeah. My suspicion is that SD_BOTH is what closesocket() does if
shutdown() hasn't been previously called.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com