Re: backtrace_on_internal_error

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: backtrace_on_internal_error
Дата
Msg-id 20231208193316.5ylgs4zb6zngwyg4@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: backtrace_on_internal_error  (Andres Freund <andres@anarazel.de>)
Ответы Re: backtrace_on_internal_error  (Andres Freund <andres@anarazel.de>)
Re: backtrace_on_internal_error  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2023-12-08 10:51:01 -0800, Andres Freund wrote:
> On 2023-12-08 13:46:07 -0500, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2023-12-08 13:23:50 -0500, Tom Lane wrote:
> > >> Hmm, don't suppose you have a way to reproduce that?
> >
> > > After a bit of trying, yes.  I put an abort() into pgtls_open_client(), after
> > > initialize_SSL(). Connecting does result in:
> > > LOG:  could not accept SSL connection: Success
> >
> > OK.  I can dig into that, unless you're already on it?
>
> I think I figured it it out. Looks like we need to translate a closed socket
> (recvfrom() returning 0) to ECONNRESET or such.

I think we might just need to expand the existing branch for EOF:

                if (r < 0)
                    ereport(COMMERROR,
                            (errcode_for_socket_access(),
                             errmsg("could not accept SSL connection: %m")));
                else
                    ereport(COMMERROR,
                            (errcode(ERRCODE_PROTOCOL_VIOLATION),
                             errmsg("could not accept SSL connection: EOF detected")));

The openssl docs say:

 The following return values can occur:

0

    The TLS/SSL handshake was not successful but was shut down controlled and by the specifications of the TLS/SSL
protocol.Call SSL_get_error() with the return value ret to find out the reason.
 
1

    The TLS/SSL handshake was successfully completed, a TLS/SSL connection has been established.
<0

    The TLS/SSL handshake was not successful because a fatal error occurred either at the protocol level or a
connectionfailure occurred. The shutdown was not clean. It can also occur if action is needed to continue the operation
fornonblocking BIOs. Call SSL_get_error() with the return value ret to find out the reason.
 


Which fits with my reproducer - due to the abort the connection was *not* shut
down via SSL in a controlled manner, therefore r < 0.


Hm, oddly enough, there's this tidbit in the SSL_get_error() manpage:

 On an unexpected EOF, versions before OpenSSL 3.0 returned SSL_ERROR_SYSCALL,
 nothing was added to the error stack, and errno was 0. Since OpenSSL 3.0 the
 returned error is SSL_ERROR_SSL with a meaningful error on the error stack.

But I reproduced this with 3.1.


Seems like we should just treat errno == 0 as a reason to emit the "EOF
detected" message?



I wonder if we should treat send/recv returning 0 different from an error
message perspective during an established connection. Right now we produce
  could not receive data from client: Connection reset by peer

because be_tls_read() sets errno to ECONNRESET - despite that not having been
returned by the OS.  But I guess that's a topic for another day.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: backtrace_on_internal_error
Следующее
От: "Daniel Verite"
Дата:
Сообщение: Re: Emitting JSON to file using COPY TO