Many thanks for the response, couple of other questions:
1) If I use a job step to call a bash script - quite rightly if the script executes and completes it shows with Status of S. If I change the script exit code on an error - can I trigger a separate status to be recorded?
Non-zero return values should be interpreted by pgAgent as a failure.
2) if we have a step that shows as running, how can we mark that as done and move on to the next step in the job?
You cannot change what the agent is doing mid-job. You'd need to kill the agent and restart it if it's got stuck somehow. It *should* automatically cleanup the zombie job and re-schedule as appropriate.
I was wondering if anyone can help. I am running a Postgresql 9.4 server on Ubuntu 14.04 and having issues with pageant jobs saying that they are running but the jobs are not doing anything. This is not every job and it appears that the issue happens at random. I am running pgaent v3.4.1.
I was just wondering if anyone has any code to help diagnose pageant issues. The executable works fine and does not crash and there are no errors in the pageant log.
Have you tried increasing the log level to get more detailed info about the schedulers operation?
Also, does anyone one have code they can share to reset stuck jobs in the pageant tables as I am doing but not sure if I am doing it correctly as have a limited understanding of how all the pieces fit together.
The agent should run any job that has it's "jobnextrun" value in the pga_job table less than or equal to the current time, with jobenabled = true and jobagentid = NULL, each time it checks for jobs to run. If you've got an entry in that table for a job that you think should have run but did not, please share the data and we may be able to see why.
Also does anyone have a technical overview of how it works?