Re: backtrace_on_internal_error

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: backtrace_on_internal_error
Дата
Msg-id 20231209210642.h3uxp66annsueufh@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: backtrace_on_internal_error  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: backtrace_on_internal_error  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2023-12-09 12:41:30 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> I was wondering about that too. But if we do so, why not also do it for
> >> writes?
> 
> > Writes don't act that way, do they?  EOF on a pipe gives you an error,
> > not silently reporting that zero bytes were written and leaving you
> > to retry indefinitely.
> 
> On further reflection I realized that you're right so far as the SSL
> code path goes, because SSL_write() can involve physical reads as well
> as writes, so at least in principle it's possible that we'd see EOF
> reported this way from that function.

Heh. I'll just claim that's what I was thinking about.


> Also, the libpq side does need work of the same sort, leading to the
> v2-0001 patch attached.

I'd perhaps add a comment explaining why it's plausible that we'd see that
that in the write case.


> I also realized that we have more or less the same problem at the
> caller level, allowing a similar failure for non-SSL connections.
> So I'm also proposing 0002 attached.  Your results from aggregated
> logs didn't show "could not receive data from client: Success" as a
> common case, but since we weren't bothering to zero errno beforehand,
> it's likely that such failures would show up with very random errnos.

I did only look at the top ~100 internal errors, after trying to filter out
extensions, i.e. the list wasn't exhaustive.  There's also very few non-ssl
connections.  But I just checked, and for that error message, I do see some
XX000, but only in older versions.  There's ENETUNREACH, ECONNRESET,
ETIMEDOUT, EHOSTUNREACH which these days are all handled as non XX000 by
errcode_for_socket_access().


> diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
> index bd72a87bbb..76b647ce1c 100644
> --- a/src/interfaces/libpq/fe-secure.c
> +++ b/src/interfaces/libpq/fe-secure.c
> @@ -211,6 +211,8 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
>      int            result_errno = 0;
>      char        sebuf[PG_STRERROR_R_BUFLEN];
>  
> +    SOCK_ERRNO_SET(0);
> +
>      n = recv(conn->sock, ptr, len, 0);
>  
>      if (n < 0)
> @@ -232,6 +234,7 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
>  
>              case EPIPE:
>              case ECONNRESET:
> +            case 0:                /* treat as EOF */
>                  libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
>                                          "\tThis probably means the server terminated abnormally\n"
>                                          "\tbefore or while processing the request.");

If we were treating it as EOF, we'd not "queue" an error message, no? Normally
recv() returns 0 in that case, so we'd just return, right?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sutou Kouhei
Дата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: encoding affects ICU regex character classification