"Operation on non-socket" analysis
От | Magnus Hagander |
---|---|
Тема | "Operation on non-socket" analysis |
Дата | |
Msg-id | 6BCB9D8A16AC4241919521715F4D8BCE475D1B@algol.sollentuna.se обсуждение исходный текст |
Список | pgsql-hackers-win32 |
Hello! I have now, with the help of Harald, analysed the problem with "Operation attempted on something that is not a socket" error on win32 when some third-party LSPs are installed. My initial thought was that it had to do something with our blocking-over-non-blocking-emulation code that is tehre to handle signal delivery, since this is a problem that does not happen to a lot of other programs. This turned out to be incorrect. The problem is related to the multi-process model used by postgresql, where most win32 programs uses a multi-threaded model. It seems that at least the LSP Harald has had problems with, and I bet most others, break socket inheritance. We accept() the socket in the postmaster, then CreateProcess() a new process and inherit the handle. This breaks on these LSPs. Per Microsofts own documentation, we should be able to do what we do since we are NT only and not 9x (see for example http://support.microsoft.com/default.aspx?scid=kb;en-us;150523 - "Under Windows NT and Windows 2000, socket handles are inheritable by default. This feature is often used by a process that wants to spawn a child process and have the child process interact with the remote application on the other end of the connection. "). This means that is is a bug in the LSP. That said, a workaround would be nice, since we are already receiving several reports about this problem. I have tried using DuplicateHandle() (which is strictly speaking incorrect, since the API does not let us kno that a HANDLE and a SOCKET is actually the same thing, but still work at ry), and it has the same behaviour. The only think I can think of testing further is using WSADuplicateSocket(). This is significantly more complex to implement (since it requires the pid of the child before it can be executed, for one thing). I will see if/when I get a chance to test this out in my test program - if somebody else beats me to writing a test program for it, please do ;-) Attached is the ugly little test program I wrote that shows this behaviour. It works on my machiens, it shows the error on Haralds machine. Start it in one console (needs to be console, not double-click, or messages are lost). Then from another console, telnet to localhost on port 999 and type anything at all. It should show error code 0. It shows error code 10038 when it fails. If WSADuplicateSocket() does not fix it, we should probably add the check early in the installer to tell the user what the problem is instead of erroring out the way we do now. Does anybody have any further ideas on this subject? //Magnus <<sockt.c>>
Вложения
В списке pgsql-hackers-win32 по дате отправления: