Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This is a design proposal for PBS to start supporting use of conditional and logical operators in the resources requested by jobs.

link to forum discussion

Interface 1: Extend PBS to allow users to submit jobs requesting multiple select specifications using logical OR ("||") operator.

  • Visibility: Public
  • Change Control: Stable
  • Details: 
    • Users can now submit jobs with multiple select specification with logical OR operator. 
      • Example:  qsub -lselect="2:ncpus=4||4:ncpus=2" job.scr
    • To request multiple select specification, all specification must be separated out by "||" delimiters and select statement must be specified in single/double quotes. Quotes can be used before or after the select.
    • When a job is requested with multiple select specifications, PBS server will honor the queued limits set on the server by finding out the maximum of all the resources requested by the job"
      • Example: If server has a limit set as qmgr -c "s s max_queued_res.ncpus=[u:user1=10] and user "user1" has no jobs queued then
        submitting a job like this "qsub -lselect="3:ncpus=3:mem=18gb||4:ncpus=3:mem=12gb" job.scr" will make server think that the job is asking for a maximum of  54gb of memory and 12 ncpus.
        Thus, this job will hit the queued job limits for resource 'ncpus'.
    • PBS scheduler will try to run the job with each select specification specified starting from "left to right". It will run the job as soon as it gets to
      a select specification that can be satisfied.
    • PBS scheduler will honor the run limits (soft or hard) based of the each select specification that it is using to run the job.
    • Once PBS scheduler decides on which select specification can be used to run the job, it will modify the job's "selectedspec" attribute to reflect the selected specification.
    • When a job with multiple select specification is requested to rerun using "qrerun" command, then it's selectedspec attribute will be cleared.
    • logical OR operator can not be used to submit reservations.
    • If a job with multiple select specifications happens to be a job a top job, then scheduler will only consider first select specification to calendar this job.


Interface 2: Extend PBS to allow users to submit jobs requesting non-consumable resources with conditional operators.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • Users can request for non-consumable chunk level resources with conditional operator like "<, >, <=, >=, !=.
      • Example: qsub -lselect=1:ncpus=4:color!=green job.scr
    • In case the non-consumable resource here is a string then, PBS scheduler will do the resource comparison using string comparison function.
    • Users can request for non-consumable resources with conditional operators in reservations as well.


Interface 3: New log/error messages added to PBS

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • When a job is submitted with multiple select specifications then scheduler tries to run the job selecting each select specification (from left to right). If scheduler finds out that a given select specification will not be able to run the job then it will log the following log at level DEBUG3 in scheduler's log file:
      "No match for <select spec>, reason: <reason why job could not run>"
      Example:
      01/18/2017 10:10:35;0100;pbs_sched;Job;13.centos;No match for 1:ncpus=23, reason: Insufficient amount of resource: ncpus (R: 23 A: 20 T: 20)
      OR
      01/18/2017 10:18:16;0100;pbs_sched;Job;14.centos;No match for 2:ncpus=4, reason: would exceed user root's limit on resource ncpus in complex
    • When a job could not run and all the given select specifications can not be satisfied then the job comment is set to the reason why the last select specification (right most) could not run.
      • In this case scheduler also logs an INFORMATION log stating that none of the select specifications can be satisfied.
        "None of the select specification could be satisfied"
    • When a job is submitted using a conditional operator with a consumable resource it shows following error on console - 
      "qsub: Consumable resource can only be requested with = operator"
      Example:

      qsub -l select="1:ncpus=5:mem>21gb||2:ncpus=4:mem=100mb" -- /bin/sleep 1000
      qsub: Consumable resource can only be requested with = operator

    • If PBS server fails to parse multiple select specification, it will log the following error log in server's log file - 
      "failed to parse ORed select spec: <select spec>"

Interface 4: New job attribute "ATTR_RequestedSpec"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • When a multiselect job is queued it's "schedselect" attribute shows the select specification user gave when he/she submitted the job.
    • When scheduler decides to run the job using one of the selected select specification then it updated the server about it and server internally updated the schedselect with the selected specification.
      • Scheduler will do so for array sub-jobs as well. So there are chances that subjobs of an array parent may show different "schedselect" than each other.
    • This means that select specification given by user must be stored somewhere to make sure that when the job get's requeued or rerun it has all the select specification in place.
    • A new attribute "requestedspec" is added to the job which will make server store the requested select specification to be stored in job attributes.
    • This attribute is only set when server notices that multiple select specifications are given by user. For jobs with single select specification this attribute is not set.
    • This attribute is of type "string" and only PBS manager has read privileges to it.
    • Example: for a job like this - qsub -l select="1:ncpus=23:mem=2gb||2:ncpus=3:mem=100mb" job.scr
      qstat output for "schedselect" will show up like this - 

      qstat -f 22 | grep -e state -e schedselect -e requested
      job_state = Q
      schedselect = 1:ncpus=23:mem=2gb||2:ncpus=3:mem=100mb
      substate = 10 
      requestedspec = 1:ncpus=23:mem=2gb||2:ncpus=3:mem=100mb


      qstat -f 24 | grep -e state -e schedselect -e requested
      job_state = R
      schedselect = 2:ncpus=3:mem=100mb
      substate = 42
      requestedspec = 1:ncpus=23:mem=2gb||2:ncpus=3:mem=100mb


Interface 5: New resource call "job_wide"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new resource called "job_wide" is added. This resource will be used to specify all job wide resources (resources that can not be part of a select specification) like walltime/min_walltime/max_walltime/place etc.
    • It is a string type resource with flags set as READ_WRITE. This means a user/operator/manager will be able to read or write this resource.
    • It is not mandatory to have this resource specified with "multiple select specifications". If not specified, PBS will consider job wide resources specified outside of select to be valid for all select specifications.
    • This resource can only be used with jobs that have multiple select specification specified with them. If it is used without multiple select specification then job submission will be rejected with the following error:
      "
      qsub: job_wide resource can only be used when multiple select specifications are specified"
    • The number of "ORed" select specification must match the number of "ORed" job_wide resources. If either of them does not match then the job submission is rejected with the following error:
      "qsub: job_wide resources and select specifications do not match"
    • If a chunk level resource is specified in the job_wide resource then that job submission will be rejected with following error:
      "qsub: Resource invalid in "job_wide" specification: <resource name>"


Interface 6: New Job attribute called "sched_job_wide"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new job attribute "sched_job_wide" is added to the job.
    • This attribute is of type string with flags set to as "ATR_DFLAG_MGRD". This means only a manager has privileges to read this attribute.
    • This attribute will only be filled by PBS server when it encounters a job that has "job_wide" resource list specified in the job.
    • It consist of all job wide resources as specified with the job along with all the default resources that are specified on the queue/server.
      • If the resource specified in the job_wide resources matches the one specified as a default on queue/server then resource mentioned in "job_wide" will take precedence.


Interface 7: New job attribute "ATTR_l_max"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • When a job with "multiple select specification and/or job_wide resources" is specified then server goes through each one of these select specification/job_wide resources and pick up maximum amount requested for each resource.
    • This list of resource request is then used to compare with max_queued or resources_max limits to make sure that none of these select specification/job_wide resources can exceed limits set on queue or server in any circumstances.
    • "ATTR_l_max" is displayed as "max_resc_req" when a job with "multiple select specifications and/or job_wide resources" is queried.
    • It is of type "resource" and only operator and manager has read privileges to it.
    • Example: for a job like this - qsub -l select="1:ncpus=23:mem=2gb||2:ncpus=4:mem=100mb"  -l job_wide="walltime=00:12:00||walltime=00:08:00" job.scr
      qstat output for max_resc_req will show up - 

      qstat -f 22 | grep max_resc_req
      max_resc_req.mem = 2gb
      max_resc_req.ncpus = 23
      max_resc_req.nodect = 1
      max_resc_req.walltime = 00:08:00


Interface 8: New job attribute "ATTR_l_min"

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • When a job with "multiple select specification and/or job_wide resources" is specified then server goes through each one of these select specification/job_wide resource and pick up minimum amount requested for each resource.
    • This list of resource request is then used to compare with resources_min limits to make sure that none of these select specification/job_wide resources can hit limit set on queue or server in any circumstances.
    • "ATTR_l_min" is displayed as "min_resc_req" when a job with "multiple select specifications and/or job_wide resources" is queried.
    • It is of type "resource" and only operator and manager has read privileges to it.
    • Example: for a job like this - qsub -l select="1:ncpus=23:mem=2gb||2:ncpus=4:mem=100mb" job.scr
      qstat output for min_resc_req will show up - 

      qstat -f 22 | grep min_resc_req
      min_resc_req.mem = 200mb
      min_resc_req.ncpus = 8
      min_resc_req.nodect = 1


Interface 9: Limitation in running jobs with "qrun -H" option.

  • Visibility: Public
  • Change control: Stable
  • Details:
    • If a user wants to run a job using "qrun -H" and if this job was submitted with multiple select specification and/or multiple job_wide resources then following error will be thrown on the console.
      "qrun: modify the job to specify the correct resource specification before running the job"
    • This is needed because in case of "qrun -H" option, user specified the exec_host list and server would not know that out of multiple select/job_wide resources which one has the user considered to run this job.
  • No labels