As I posted before, changing the timeout from 1000 to
NMPWAIT_WAIT_FOREVER fixed the problem, or at least moved it such it
does not occur easily anymore.
To better understand the problem, I added debugging as Tom suggested. I
restored timeout on CalledNamedPipe 1000 ms, and reran my tests.
Indeed, kill is encountering an error:
LOG: kill(2168) failed: No such process
I instrumented pgkill to output the value of GetLastError() if
CalledNamedPipe fails. It returned error code 2, which Windows
identifies as ERROR_FILE_NOT_FOUND. The logic in pgkill translates this
Windows error into an errno value of ESRCH.
The Windows error is a bit surprising, at least to me -- I expected
something indicating the pipe was full. Does anyone have a richer
interpretation of this error?
Thanks,
Steve
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Marshall, Steve wrote:
>> Any thoughts on how to confirm or deny Theory A?
> Try changing the 1000 to NMPWAIT_WAIT_FOREVER
As long as you're changing the source code, it'd be a good idea to
verify the supposition that kill() is failing, eg in
src/backend/commands/async.c
if (kill(listenerPID, SIGUSR2) < 0)
{
+ elog(LOG, "kill(%d) failed: %m",
listenerPID);
/*
* Get rid of pg_listener entry if it
refers to a PID that no
* longer exists. Presumably, that
backend crashed without
* deleting its pg_listener entries.
This code used to only
If that's right, sprinkling a few debug printouts into src/port/kill.c
would be the next step.
regards, tom lane