Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
От | Alexander Lakhin |
---|---|
Тема | Re: Why is src/test/modules/committs/t/002_standby.pl flaky? |
Дата | |
Msg-id | 3b904d7b-ef84-6f1b-9326-9f88c1374eb8@gmail.com обсуждение исходный текст |
Ответ на | Re: Why is src/test/modules/committs/t/002_standby.pl flaky? (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
|
Список | pgsql-hackers |
10.01.2022 05:00, Thomas Munro wrote: > On Mon, Jan 10, 2022 at 8:06 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> On Mon, Jan 10, 2022 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: >>> Going down through the call chain, I see that at the end of it >>> WaitForMultipleObjects() hangs while waiting for the primary connection >>> socket event. So it looks like the socket, that is closed by the >>> primary, can get into a state unsuitable for WaitForMultipleObjects(). >> I wonder if FD_CLOSE is edge-triggered, and it's already told us once. > Can you reproduce it with this patch? Unfortunately, this fix (with the correction "(cur_event & WL_SOCKET_MASK)" -> "(cur_event->events & WL_SOCKET_MASK") doesn't work, because we have two separate calls to libpqrcv_PQgetResult(): > Then we get COMMAND_OK here: > res = libpqrcv_PQgetResult(conn->streamConn); > if (PQresultStatus(res) == PGRES_COMMAND_OK) > and finally just hang at: > /* Verify that there are no more results. */ > res = libpqrcv_PQgetResult(conn->streamConn); The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket() where WaitEvents are defined locally, and the closed flag set on the first invocation but expected to be checked on second. >> I've managed to reproduce this failure too. >> Removing "shutdown(MyProcPort->sock, SD_SEND);" doesn't help here, so >> the culprit is exactly "closesocket(MyProcPort->sock);". >> > Ugh. Did you try removing the closesocket and keeping shutdown? > I don't recall if we tried that combination before. Even with shutdown() only I still observe WaitForMultipleObjects() hanging (and WSAPoll() returns POLLHUP for the socket). As to your concern regarding other clients, I suspect that this issue is caused by libpqwalreceiver' specific call pattern and may be other clients just don't do that. I need some more time to analyze this. Best regards, Alexander
В списке pgsql-hackers по дате отправления: