This is a design proposal for PBS to support job submissions with multiple resource requests (with conditional operators) and capability to run only one of them.

Gist of proposed changes:

There are basically two requirements that we are trying to fulfill here - 

One requirement is for the user/admin to be able to specify their node filter criteria. This requirement can be met if we have support to specify some kind or node filter with the job. This filter could be a python expression that can consist of conditional operators with non-consumable resources present on the nodes for now. The filter concept can also be extended to be used as queue/server limits or while trying to find a preemption candidate out of running jobs etc.

Second requirement is for the user/admin to be able to provide multiple resource specifications and make PBS use one as soon as it knows that it can start the job with that resource specification. PBS scheduler shall look at each of the resource specifications in the order they get sorted according to scheduling policies and may choose to run the job with a specification as soon as it knows that it can. This fall in line with PBS scheduler's way of finding a node solution based on the "first fit" algorithm.



link to forum discussion

Interface 1: New Job attribute called “job_set” - qsub option “-W job_set”


Interface 2: New qsub option “-L” (resource request).

                  qsub –A “abcd” -L select=1:ncpus=16:mem=2gb,nfilter=“resources_available[‘os_ver’]>=rhel6 and resources_available[‘color’]==‘blue’ ”,walltime=02:00:00  -L select=2:ncpus=8:mem=2gb,nfilter=“resources_available[‘os_ver’]!=rhel7 and resources_available[‘color’]!=‘black’ ”,walltime=01:45:00 job.scr


Interface 3: Extend PBS to allow users to submit jobs with a node-filter (nfilter) resource.


Interface 4: New job substate “JOB_SUBSTATE_RUNNING_SET” (95)


Interface 5: New error code PBSE_MOVE_JOBSET


Interface 6: New error code PBSE_NO_JOBSET


Interface 7: New job comment for jobs in substate 95


Interface 8: New qselect option “- - job_set”


Interface 9: move or peer-scheduling of job_set jobs is not allowed.


Interface 10: When a running job of a job_set is requeued.


Interface 11: When a job of a job_set starts running


FUTURE ENHANCEMENTS

—————————————


Going forward the same concept can be interpreted in terms job arrays as well and job arrays just becomes a subset of a job_set case. Job arrays are essentially job_set with a difference that in this case user wants all its subjobs to run instead of running only one.

 

If we expose a way to tell server whether we need only one job to run out of the set or all the jobs (like –R RUN_ONE|RUN_ALL), then server can internally take a decision when to delete the job_set.

 

Same syntax can be used to even submit job arrays.

EXAMPLE 1:

qsub –R RUN_ALL –L “select=1:ncpus=16:mem=2gb,nfilter=“resources_available[‘os_ver’]>=rhel6 and resources_available[‘color’]==‘blue’ ”,walltime=02:00:00  -L select=2:ncpus=8:mem=2gb,nfilter=“resources_available[‘os_ver’]!=rhel7 and resources_available[‘color’]!=‘black’ ”,walltime=01:45:00 job.scr

 

Since job arrays mostly consists of same resource specifications users can also do something like this –

EXAMPLE 5:

Qsub –R RUN_ALL –J 0-9 –l select=1:ncpus=16:mem=2gb,nfilter=“resources_available[‘os_ver’]>=rhel6 and resources_available[‘color’]==‘blue’ ”,walltime=02:00:00 job.scr

 

If the job is received out of this submission (Example 1  or Example 2) is 123.server1 then one can access the first pool job as 123.server1 or 123[0].server1 and to access second job they can do 123[1].server1. Internally PBS server will map the index of the subjob specified to an actual job which was part of the same job_set.