Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Second requirement is for the user/admin to be able to provide multiple resource specifications and make PBS use one as soon as it knows that it can start the job with that resource specification. PBS scheduler shall look at each of the resource specifications in the order they get sorted according to scheduling policies and may choose to run the job with a specification as soon as it knows that it can. This fall in line with PBS scheduler's way of finding a node solution based on the "first fit" algorithm.

In case scheduler finds out that it can not run such a job because of resource unavailability and tries to calendar the job so that resources can be reserved for this job in future, it will use only the first resource specification that it encounters in it's sorted list of jobs and use that to calendar the job.

If running job which was initially submitted with multiple resource specifications gets requeued for any reason (like qrerun or node_fail_requeue or preemption by requeue), the job will get reevaluated to run by looking at each of the multiple resource specifications it was initially submitted with.



link to forum discussion

Interface 1: New Job attribute called “job_set” - qsub option “-s”W job_set”

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new job attribute “job_set” is added to the job.
    • This attribute is of type string and user/operator/manager has privileges to read/write this attribute.
    • Users can submit jobs specifying “-s” W job_set” option during submission. This attribute can only take an already submitted job-id as a value.
      • If a user specifies an invalid job id then job submission can fail with the following error -  “qsub: a nonexistent job_set specified
      • If a user specifies a legitimate job-id but this job-id isn't a job_set leader then, job submission can fail with the following error - "qsub: a nonexistent job_set specified
  • When a job is submitted with a legitimate job-id specified in the job_set (‘-s” W job_set' option), PBS server will submit this job and make it part of job_set which is led by specified job-id.
  • If a user wants to modify the job_set of and already existing job, then can do so by issuing “qalter -s W job_set=<new job_set id> <job id to be modified>” command.

...

                  qsub –A “abcd” -L select=1:ncpus=16:mem=2gb,nfilter=“resources_available[‘os_ver’]>=rhel6 and resources_available[‘color’]==‘blue’ ”,walltime=02:00:00  -L select=2:ncpus=8:mem=2gb,nfilter=“resources_available[‘os_ver’]!=rhel7 and resources_available[‘color’]!=‘black’ ”,walltime=01:45:00 job.scr
  • using

    #PBS

    directive

    -

    #PBS –A “abcd”
    #PBS –L select=1:ncpus=16:mem=2gb,nfilter=“resources_available[‘os_ver’]>=rhel6 and resources_available[‘color’]==‘blue’ ”,walltime=02:00:00
    #PBS –L select=2:ncpus=8:mem=2gb,nfilter=“resources_available[‘os_ver’]!=rhel7 and resources_available[‘color’]!=‘black’ ”,walltime=01:45:00 
  • using qsub "-sW job_set" option:

    • If user already knows that there is a job_set that exists in server then he/she can submit another job to the same job_set by specifying it's name using "-sW job_set" option.
  • Every resource request specified by “-L” option will get queued as a separate job and will get it’s own job id.

...

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • User can now specify a node_filter with each of their jobs and this filter will help scheduler to filter out nodes that this job is allowed to run on.
    • A new resource “nfilter” is created. This resource is of type string. Users/operator/manager has privileges to read/write this resource.
    • nfilter is evaluated as an expression by PBS scheduler to filter out nodes that can be used to run the job in hand.
    • Users can specify a node filter with node resources using conditional operator like "<, >, <=, >=, !=.
      • Example: qsub -Lselect=3:ncpus=2:mem=18gb,nfilter=“resources_available[‘ncpus’]>=4 and resouces_available[‘color’] != ‘green’”,walltime=10000 -Lselect=2:ncpus=2:mem=24gb,nfilter=“resources_available[‘ncpus’]>16 and resources_available[‘color’]=‘blue’”,walltime=8000 job.scr
    • nfilter can make use of resources which are available with the nodes using “resources_available.” prefix with the resource, it can use resources that are assigned in the resource using “resources_assigned.” prefix before the resource. These are the only two inputs it can use to filter out the nodes.
    • To access a specific resource out of resources_available, resources_assigned inputs, users must enclose each resource name within square brackets “[ ]” like this - “resources_available[‘ncpus’]


Interface 4: New job substate “JOB_SUBSTATE_RUNNING_SET” (95)

  • Visibility: Private
  • Change Control: Stable
  • Details:
    • When a job of a job_set starts running then all other jobs of the same job_set will be marked in hold state and their substate will be set to 95.
    • Job substate 95 identifies that this held job is part of a job_set which has one job running in it.


Interface 5: New error code PBSE_MOVE_JOBSET

...

  • Visibility: Private
  • Change Control: Stable
  • Details:
    • When pbs server tries to find out a job_set with a specified job_set id but unable to find it will use the error_code “PBSE_NO_JOBSET” (15212)


Interface 7: New job comment for jobs in substate 95

  • Visibility: Private
  • Change Control: Stable
  • Details:
    • When a job of a job_set starts running then all other queued jobs of the same job_set that are in substate 95 will have a new job comment as “Job held, job <job-id> running from this job_set


Interface 8: New qselect option “- - job_set”

  • Visibility: Private
  • Change Control: Stable
  • Details:
    • A new select command option “- - job_set” is added.
    • It accepts a string as an input value. This string must be the job_id which represents the leader of the job_set user is trying to query.
    • If server could not find any such job_set then select command will fail with the following error message - “qselect: a nonexistent job_set specified


Interface 9: move or peer-scheduling of job_set jobs is not allowed.

  • Visibility: Private
  • Change Control: Stable
  • Details:
    • Jobs that are part of a job_set are not allowed to be peered or moved to another complex.
  • If a peering complex tries to move a job that is part of a job_set from furnishing complex following error code will be returned “PBSE_MOVE_JOBSET” (15211)

...

Interface 11: When a job of a job_set endsstarts running

  • Visibility: Public
  • Change Control: StableExperimental
  • Details:
    • When a running job of a job_set finishesstart running, all the held other jobs of the job_set are also moved to finished state.

...