Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • This new dependency type accepts a ':' separated list of job ids on which the job been submitted is dependent on.

    • Example:

    • Code Block

...

languagebash
    • % qsub -lncpus=4 -- /bin/sleep 1000
      7.centos
      % qsub -lncpus=4 -- /bin/sleep 1000
      8.centos
      % qsub -lncpus=2 -Wdepend=runone:7:8 -- /bin/sleep 1000
      9.centos 
      

  • When user specifies “runone” dependency on one or more jobs, PBS server will add reverse “runone” dependency on all the dependent jobs.

    • Example:

    • In the example below job 7 and 8 were submitted as independent jobs and job 9 was submitted with run one dependency on job 7 and 8. This PBS server put a reverse “runone” dependency on job 7 and 8 as well.

      Code Block
      % qstat -f | grep -e "Job Id" -e "depend"
      Job Id: 7.centos
          depend = runone:9.centos@centos:8.centos@centos
      Job Id: 8.centos
          depend = runone:9.centos@centos:7.centos@centos
      Job Id: 9.centos
          depend = runone:7.centos@centos:8.centos@centos
          Submit_arguments = -lncpus=2 -Wdepend=runone:7:8 -- /bin/sleep 1000
      
  • When one of the jobs in the group of “runone” dependency starts running, PBS server puts a “System” hold on all the dependent jobs.

  • It is only when the running job ends (or is deleted), its dependency is released and all the dependent jobs are deleted.

  • When the dependent jobs are deleted from the system an abort accounting record is logged by server stating why dependency was released

    • Example:

    • In the following case, job 9 had a “runone” dependency on job 7 and 8. When job 9 finished, the server released the dependency on job 7 and 8 and logged the following accounting record.

      Code Block
      02/06/2020 17:28:18;A;7.centos;Job deleted as result of dependency on job 9.centos
      02/06/2020 17:28:18;A;8.centos;Job deleted as result of dependency on job 9.centos
  • In a scheduling cycle if the scheduler looks at multiple jobs with runone dependency on each other then it will mark the job as “can not run” as soon as it is able to run one of the dependent jobs.

    • There may however be a case that scheduler may calendar jobs which are part of “runone” dependency but have not been able to run and one of the other dependent jobs that scheduler considers to run end up running.

      • In such cases, the job that couldn’t run but was added to the calendar will remain in the calendar for the rest of the cycle. The scheduler will correct itself from next cycle onwards because the calendared job from the next cycle will start showing up as “Held”.

  • When a running job that belongs to a “runone” dependency group is requeued by PBS server (in case of preemption or qrerun) then system hold is released on all the dependent jobs and all those jobs move into “queued” state.

  • If a user tries to submit a job to “runone” dependency group when one of the job from that group is already running, such qsub request will be rejected with error code “Invalid request”.