Re: [PERFORM] Hanging queries on dual CPU windows
От | Magnus Hagander |
---|---|
Тема | Re: [PERFORM] Hanging queries on dual CPU windows |
Дата | |
Msg-id | 6BCB9D8A16AC4241919521715F4D8BCEA3510E@algol.sollentuna.se обсуждение исходный текст |
Список | pgsql-hackers |
> > Ok, I've coded up a patch that changes the code to use a > mutex instead. > > Are we asserting the problem is caused by the spinlock random > wake-up order? Not asserting, more making a wild guess. Which I, as I said, no lnoger really beleive in - but since the patch was already coded up it's worth a try. > I am not sure why this would fix the problem. If my memory > serves, a critical section might be a problem if one process > aborts unexpected while it is inside. Other waiting processes > can never have a chance to enter it (also have no chance to > handle SIGQUIT) -- so this patch may solve this. A critical section only exists within a single process, so that realliy doesn't apply. And if a thread crashes, the whole process exists. > There is another suspect in > http://www.devisser-siderius.com/stack1.jpg, > i.e., process 3 does shmctl. I once filed a server core dump > bug in win32 of reporting WSAEWOULDBLOCK. > (http://archives.postgresql.org/pgsql-bugs/2006-02/msg00185.ph > p). AFAICS, it is actually an mistranslated EINTR. There > seems some relation between these issues, but I didn't come > up with a complete theory of it. There could well be. Except the link you sent pointed to a thread stuck in pgwin32_waitforsinglesocket() insider pgwin32_send() - this is where I beleive the problem is now. I'm less-than-trusting the function names in the stacktrace after examining some more. I'm suspecting process explorer can only see non-static functions, and that the "pg_queue_signal+0x120" actually points into a different function. (really, pg_queue_signal cannot possibly be 0x120 bytes machine code..) I bet it's just in pg_signal_thread(), which is a perfectlyi normal place to block. It also matches the behaviour I see on a completely fresh backend - which also shows that pg_queue_signal+0x120. A good thing to test would be to rebuild signal.c and socket.c without any functions declared as static and see if the picture changes. (If nothing else it would confirm this behaviour in process explorer) Mvh,Magnus
В списке pgsql-hackers по дате отправления: