Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This is a design proposal for PBS to start supporting use of conditional and logical operators in the resources requested by jobs.

link to forum discussion

Interface 1: Extend PBS to allow users to submit jobs requesting multiple select specifications using logical OR ("||") operator.

  • Visibility: Public
  • Change Control: Stable
  • Details: 
    • Users can now submit jobs with multiple select specification with logical OR operator. 
      • Example:  qsub -lselect="2:ncpus=4||4:ncpus=2" job.scr
    • To request multiple select specification, all specification must be separated out by "||" delimiters and select statement must be specified in single/double quotes. Quotes can be used before or after the select.
    • When a job is requested with multiple select specifications, PBS server will honor the queued limits set on the server by finding out the maximum of all the resources requested by the job"
      • Example: If server has a limit set as qmgr -c "s s max_queued_res.ncpus=[u:user1=10] and user "user1" has no jobs queued then
        submitting a job like this "qsub -lselect="3:ncpus=3:mem=18gb||4:ncpus=3:mem=12gb" job.scr" will make server think that the job is asking for a maximum of  54gb of memory and 12 ncpus.
        Thus, this job will hit the queued job limits for resource 'ncpus'.
    • PBS scheduler will try to run the job with each select specification specified starting from "left to right". It will run the job as soon as it gets to
      a select specification that can be satisfied.
    • PBS scheduler will honor the run limits (soft or hard) based of the each select specification that it is using to run the job.
    • Once PBS scheduler decides on which select specification can be used to run the job, it will modify the job's "selectedspec" attribute to reflect the selected specification.
    • When a job with multiple select specification is requested to rerun using "qrerun" command, then it's selectedspec attribute will be unset.
    • logical OR operator can not be used to submit reservations.
    • If a job with multiple select specifications happens to be a job a top job, then scheduler will only consider first select specification to calendar this job.


Interface 2: Extend PBS to allow users to submit jobs requesting non-consumable resources with conditional operators.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • Users can request for non-consumable chunk level resources with conditional operator like "<, >, <=, >=, !=.
      • Example: qsub -lselect=1:ncpus=4:color!=green job.scr
    • In case the non-consumable resource here is a string then, PBS scheduler will do the resource comparison using string comparison function. In other words, PBS does not really know about the resource and how it has been set up, So it uses alphanumeric string comparison functions to actually compare the resource values while honoring conditional operators.
      Admins need to be very careful in specifying values to these resources so that they can get meaning comparison using conditional operators.
      For example: specifying string value to a version resource like - "ver_12" might turn out to be greater than "ver_045". But, if all such resources have three numerical digits at the end then ver_012" will turn out to be lesser than "ver_045" which is probably what user wants to see.
    • Users can request for non-consumable resources with conditional operators in reservations as well.


Interface 3: New log/error messages added to PBS

  • Visibility: Private
  • Change Control: Experimental
  • Details:
    • When a job is submitted with multiple select specifications then scheduler tries to run the job selecting each select specification (from left to right). If scheduler finds out that a given select specification will not be able to run the job then it will log the following log at level DEBUG2 in scheduler's log file:
      "No match for <select spec>, reason: <reason why job could not run>"
      Example:
      01/18/2017 10:10:35;0100;pbs_sched;Job;13.centos;No match for 1:ncpus=23, reason: Insufficient amount of resource: ncpus (R: 23 A: 20 T: 20)
      OR
      01/18/2017 10:18:16;0100;pbs_sched;Job;14.centos;No match for 2:ncpus=4, reason: would exceed user root's limit on resource ncpus in complex
    • When a job could not run and all the given select specifications can not be satisfied then the job comment is set to the reason why the last select specification (right most) could not run.
      • In this case scheduler also logs an INFORMATION log stating that none of the select specifications can be satisfied.
        "None of the select specification could be satisfied"
    • When a job is submitted using a conditional operator with a consumable resource it shows following error on console - 
      "qsub: Consumable resource can only be requested with = operator"
      Example:

      qsub -l select="1:ncpus=5:mem>21gb||2:ncpus=4:mem=100mb" -- /bin/sleep 1000
      qsub: Consumable resource can only be requested with = operator

    • If PBS server fails to parse multiple select specification, it will log the following error log in server's log file - 
      "failed to parse ORed select spec: <select spec>"


Interface 4: New resource call "job_wide"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new resource called "job_wide" is added. This resource will be used to specify all job wide resources (resources that can not be part of a select specification) like walltime/min_walltime/max_walltime/place etc.
    • It is a string type resource and a user/operator/manager will be able to read or write this resource.
    • It is not mandatory to have this resource specified with "multiple select specifications". If not specified, PBS will consider job wide resources specified outside of select to be valid for all select specifications.
    • PBS will match the job_wide resource specification with select specification and use it for running the job.
      example: qsub -lselect="2:ncpus=3:mem=2gb||2:ncpus=4:mem=1gb"  -l job_wide="walltime=720||walltime=480" job.scr
      For this job walltime of 12 minutes is considered for the first select specification (2:ncpus=3:mem=2gb) and walltime of 8 minutes is considered for second select specification (2:ncpus=4:mem=1gb).
    • If a user wants to specify more than one job_wide resource then they can use '+' as the delimiter between the two resources
      example: qsub -l select="2:ncpus=3:mem=2gb||2:ncpus=4:mem=1gb"  -l job_wide="walltime=720+place=scatter||walltime=480+place=pack" job.scr
    • This resource can be used to specify multiple "ORed" job_wide resources only when there are multiple select specifications present in the job. If this condition is not met following error is thrown on console:
      "
      qsub: multiple job_wide resource can only be used when multiple select specifications are specified"
    • The number of "ORed" select specification must match the number of "ORed" job_wide resources. If either of them does not match then the job submission is rejected with the following error:
      "qsub: job_wide resources and select specifications do not match"
    • If a chunk level resource is specified in the job_wide resource then that job submission will be rejected with following error:
      "qsub: Resource invalid in "job_wide" specification: <resource name>"


Interface 5: New Job attribute called "sched_job_wide"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new job attribute "sched_job_wide" is added to the job.
    • This attribute is of type string and only a manager has privileges to read this attribute.
    • This attribute will only be filled by PBS server when it encounters a job that has "job_wide" resource list specified in the job.
    • It consist of all job wide resources as specified with the job along with all the default resources that are specified on the queue/server.
      • If the resource specified in the job_wide resources matches the one specified as a default on queue/server then resource mentioned in "job_wide" will take precedence.


Interface 6: New job attribute "max_resc_req"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • When a job with "multiple select specification and/or job_wide resources" is specified then server goes through each one of these select specification/job_wide resources and pick up maximum amount requested for each resource.
    • This list of resource request is then used to compare with max_queued or resources_max limits to make sure that none of these select specification/job_wide resources can exceed limits set on queue or server in any circumstances.
    • It is of type "resource" and only operator and manager has read privileges to it.
    • Example: for a job like this - qsub -l select="1:ncpus=23:mem=2gb||2:ncpus=4:mem=100mb"  -l job_wide="walltime=00:12:00||walltime=00:08:00" job.scr
      qstat output for max_resc_req will show up - 

      qstat -f 22 | grep max_resc_req
      max_resc_req.mem = 2gb
      max_resc_req.ncpus = 23
      max_resc_req.nodect = 1
      max_resc_req.walltime = 00:08:00


Interface 7: New job attribute "min_resc_req"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • When a job with "multiple select specification and/or job_wide resources" is specified then server goes through each one of these select specification/job_wide resource and pick up minimum amount requested for each resource.
    • This list of resource request is then used to compare with resources_min limits to make sure that none of these select specification/job_wide resources can hit limit set on queue or server in any circumstances.
    • It is of type "resource" and only operator and manager has read privileges to it.
    • Example: for a job like this - qsub -l select="1:ncpus=23:mem=2gb||2:ncpus=4:mem=100mb" job.scr
      qstat output for min_resc_req will show up - 

      qstat -f 22 | grep min_resc_req
      min_resc_req.mem = 200mb
      min_resc_req.ncpus = 8
      min_resc_req.nodect = 1


Interface 8: Limitation in running jobs with "qrun -H" option.

  • Visibility: Public
  • Change control: Stable
  • Details:
    • If a user wants to run a job using "qrun -H" and if this job was submitted with multiple select specification and/or multiple job_wide resources then following error will be thrown on the console.
      "qrun: modify the job to specify the correct resource specification before running the job"
    • This is needed because in case of "qrun -H" option, user specifies the exec_host list and server would not know that out of multiple select/job_wide resources which one has the user considered to run this job.
  • No labels