Re: [COMMITTERS] pgsql: Make new crash restart test a bit morerobust.
От | Andres Freund |
---|---|
Тема | Re: [COMMITTERS] pgsql: Make new crash restart test a bit morerobust. |
Дата | |
Msg-id | 20170919231650.mjb6awkmkrfop2rs@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: [COMMITTERS] pgsql: Make new crash restart test a bit morerobust. (Andres Freund <andres@anarazel.de>) |
Список | pgsql-committers |
On 2017-09-19 13:53:18 -0700, Andres Freund wrote: > On 2017-09-19 16:46:58 -0400, Tom Lane wrote: > > Have we forgotten an fflush() or something? > > > > Also, maybe problem is on client side. I vaguely recall a libpq bug > > wherein it would complain about socket EOF even though data remained > > to be processed. Maybe we reintroduced something like that? > > That seems quite possible. Preface: This is largely a as of yet unverified theory. But seems worthwhile to investigate whether it's the cause of this issue or not. By changing the error message I know that the "server closed the connection unexpectedly" ERROR is coming from pqsecure_raw_read(). The caller here has to be pqReadData(). Where have the following block: if (nread > 0){ conn->inEnd += nread; /* * Hack to deal with the fact that some kernels will only give us back * 1 packet per recv() call, even ifwe asked for more and there is * more available. If it looks like we are reading a long message, * loop back torecv() again immediately, until we run out of data or * buffer space. Without this, the block-and-restart behaviorof * libpq's higher levels leads to O(N^2) performance on long messages. * * Since we left-justifiedthe data above, conn->inEnd gives the * amount of data already read in the current message. We consider * the message "long" once we have acquired 32k ...B */ if (conn->inEnd > 32768 && (conn->inBufSize- conn->inEnd) >= 8192) { someread = 1; goto retry3; } return 1;} So imagine we've just read one block containing the error message from the server. That's going to be less than 32kb. So we go into the retry3 path. /* OK, try to read some data */ retry3: nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd, conn->inBufSize - conn->inEnd);if (nread <0){ if (SOCK_ERRNO == EINTR) goto retry3; /* Some systems return EAGAIN/EWOULDBLOCK for no data */ #ifdef EAGAIN if (SOCK_ERRNO == EAGAIN) return someread; #endif #if defined(EWOULDBLOCK) && (!defined(EAGAIN) || (EWOULDBLOCK != EAGAIN)) if (SOCK_ERRNO == EWOULDBLOCK) returnsomeread; #endif /* We might get ECONNRESET here if using TCP and backend died */ #ifdef ECONNRESET if (SOCK_ERRNO == ECONNRESET) goto definitelyFailed; #endif /* pqsecure_read set the error message for us */ return -1;} So if the connection actually was closed we: definitelyFailed:/* Do *not* drop any already-read data; caller still wants it */pqDropConnection(conn, false);conn->status= CONNECTION_BAD; /* No more connection to backend */return -1; and PQgetResult() will happily signal that upwards with an error: /* Wait for some more data, and load it. */ if (flushResult|| pqWait(TRUE, FALSE, conn) || pqReadData(conn) < 0) { /* * conn->errorMessagehas been set by pqWait or pqReadData. We * want to append it to any already-received error message. */ pqSaveErrorResult(conn); conn->asyncStatus = PGASYNC_IDLE; return pqPrepareAsyncResult(conn); ISTM, we need to react differently if we've already partially read data successfully? Am I missing something? Greetings, Andres Freund -- Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-committers
В списке pgsql-committers по дате отправления:
Предыдущее
От: Peter EisentrautДата:
Сообщение: [COMMITTERS] pgsql: Add basic TAP test setup for pg_upgrade
Следующее
От: Robert HaasДата:
Сообщение: Re: [COMMITTERS] pgsql: Avoid use of non-portable strnlen() in pgstat_clip_activity().