On Mon, Dec 7, 2009 at 5:32 PM, Dan Kortschak
<dan.kortschak@adelaide.edu.au> wrote:
> Thanks to everyone who has answered this. The short answer is that
> torque is not behaving the way I expected and not the way I have ever
> seen it behave in the past. The I/O binding of these jobs may have
> something to do with this, but I will look into it further.
>
> cheers
>
> On Mon, 2009-12-07 at 13:26 -0800, John R Pierce wrote:
>> I'm totally unfamiliar with torque., but you probably need to tell
>> torque to run the first script and wait for it to return before
>> running
>> the rest, its probably launching a bunch concurrently.
>>
> That *shouldn't* be the case as the contents of a torque script should
> be run sequentially (many jobs depend on this and I've never seen job
> parts run out of order), just as a sh script is (they are actually just
> csh scripts in my case). My understanding is that the parallelisation
> occurs either through using MPI or other parallel compilers or running a
> number of torque jobs, BUT I've just tested the hypothesis by running it
> as a straight csh script - and it works perfectly, so there must be
> something like that going on. I'll ask some of our more experience
> torque admins about it. Thanks.
If it turns out you need to have a lock with a 'longer than
transaction' duration, maybe advisory locks are a good fit.
merlin