Re: [BUGS] Replication to Postgres 10 on Windows is broken
От | Tom Lane |
---|---|
Тема | Re: [BUGS] Replication to Postgres 10 on Windows is broken |
Дата | |
Msg-id | 6525.1502036947@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [BUGS] Replication to Postgres 10 on Windows is broken (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: [BUGS] Replication to Postgres 10 on Windows is broken
|
Список | pgsql-bugs |
Noah Misch <noah@leadboat.com> writes: > On Sun, Aug 06, 2017 at 11:17:57AM -0400, Tom Lane wrote: >> Gut instinct says that the reason this case fails when other tools >> can connect successfully is that libpqwalreceiver is the only tool >> that uses PQconnectStart/PQconnectPoll rather than a plain >> PQconnectdb, and that there is some behavioral difference between >> connectDBComplete's wait loop and libpqrcv_connect's wait loop that > That would fit. Until v10 (commit 1e8a850), PQconnectStart() had no in-tree > callers outside of libpq itself. Yeah. After some digging around I think I see exactly what is happening. The error message would be better read as "Socket is not connected *yet*", that is, the problem is that we're trying to write data before the nonblocking connection request has completed. (This fits with the OP's observation that local loopback connections work fine --- they probably complete immediately.) PQconnectPoll believes that it just has to wait for write-ready when waiting for a connection to complete. When using connectDBComplete's wait loop, that reduces to a call to Windows' version of select(2), in pqSocketPoll, and according to https://msdn.microsoft.com/en-us/library/windows/desktop/ms740141(v=vs.85).aspx "The parameter writefds identifies the sockets that are to be checked for writability. If a socket is processing a connect call (nonblocking), a socket is writeable if the connection establishment successfully completes." On the other hand, in libpqwalreceiver, we're depending on latch.c's implementation, and it uses WSAEventSelect's FD_WRITE event: https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx If I'm reading that correctly, FD_WRITE is set instantly by the connect request, probably even in the nonblock case, and it only gets cleared by a failed write request. It looks to me like we would have to specifically look for FD_CONNECT, *not* FD_WRITE, to make this work. This is problematic, because the APIs in between don't provide a way to report that we're still waiting for connect rather than for data-write-ready. Anybody have the stomach for extending PQconnectPoll's API with an extra PGRES_POLLING_CONNECTING state? If not, can we tell in WaitEventAdjustWin32 that the socket is still connecting and we must substitute FD_CONNECT for FD_WRITE? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления: