Re: postgres_fdw test timeouts
От | Thomas Munro |
---|---|
Тема | Re: postgres_fdw test timeouts |
Дата | |
Msg-id | CA+hUKGLKZhs8Y-r1MxTDdiDvtbgaAGnCL=br5i4LDKGYd+e9YA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: postgres_fdw test timeouts (Alexander Lakhin <exclusion@gmail.com>) |
Ответы |
Re: postgres_fdw test timeouts
|
Список | pgsql-hackers |
On Thu, Dec 7, 2023 at 10:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > I think, I've found out what's going on here. > The culprit is WSAEnumNetworkEvents() assisted by non-trivial logic of > ExecAppendAsyncEventWait(). > For the case noccurred > 1, ExecAppendAsyncEventWait() performs a loop, > where ExecAsyncNotify() is called for the first AsyncRequest, but the > second one also processed inside, through a recursive call to > ExecAppendAsyncEventWait(): > -> ExecAsyncNotify -> produce_tuple_asynchronously > -> ExecScan -> ExecInterpExpr -> ExecSetParamPlan -> ExecProcNodeFirst > -> ExecAgg -> agg_retrieve_direct -> ExecProcNodeInstr -> ExecAppend > -> ExecAppendAsyncEventWait. > Here we get into the first loop and call ExecAsyncConfigureWait() for the > second AsyncRequest (because we haven't reset it's callback_pending yet), > and it leads to creating another WaitEvent for the second socket inside > postgresForeignAsyncConfigureWait(): > AddWaitEventToSet(set, WL_SOCKET_READABLE, PQsocket(fsstate->conn), ... Oh, wow. Nice detective work! Thank you for figuring that out. > So it looks like we have the same issue with multiple event handles > associated with a single socket here. > And v2-0003-Redesign-Windows-socket-event-management.patch from [1] > "surprisingly" helps in this case as well (I could not see a failure for > 100 iterations of 10 tests in parallel). Yeah, this makes perfect sense. So, commit 04a09ee is not guilty. But as the saying goes, "no good deed goes unpunished", and work on our Windows port seems to be especially prone to backfiring when kludges combine... Now we have the question of whether to go forwards (commit the "socket table" thing), or backwards (revert 04a09ee for now to clear the CI failures). I don't love the hidden complexity of the socket table and am not in a hurry to commit it, but I don't currently see another way... on the other hand we have other CI flapping due to that problem too so reverting 04a09ee would be sweeping problems under the carpet. I still need to process your feedback/discoveries on that other thread and it may take a few weeks for me to get to it.
В списке pgsql-hackers по дате отправления: