Follows the PBS Pro Design Document Guidelines.
...
Links
- Link to Developer Forum
- PP-725
- Link to pull request: <Not Yet Implemented>Pull Request
- Ref :
- pbs command "pbs_release_nodes", section 2.32 in Reference Guide v19.2.3
- sub section titled "17.6.2.5 Releasing Vnodes" in Reference Guide v19.2.3
...
This is to enhance the "node ramp down" feature, by introducing a new option "-k <select>" ("k" for "keep") to the pbs command "pbs_release_nodes". This will allow a users or admins to retain some of the sister nodes/vnodes which satisfy the "select" argument, while performing node ramp down operation.
...
- Change Control: Stable
- Synopsis: This new option to "pbs_release_nodes" specifies a select statement that is a subset of the job submission (or qalter'ed) select statement which describes the the nodes/vnodes which are to be kept assigned with the job, while releasing the remaining sister nodes/vnodes. The nodes/vnodes released will then be made available for scheduling other jobs. The resource list in a chunk spec of the sub select statement can be a partial one with respect to the full list in the corresponding chunk of the job submission (or qalter'ed) select statement.
- Permission: as described in the Ref 1. above
...
- Example of usage :
Lets submit a job with a select string
$ qsub -l select=34:model=abc:ncpus=5+3:model=abc:bigmem=true:ncpus=1+32:model=def:ncpus=32 job.scr
120.pbssrv
...
$ qstat -f 120| egrep exec_vnode
exec_vnode = (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_3[0]:ncpus=5)+(nd_abc_3[1]:ncpus=15)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)+(nd_abc_6_bm:ncpus=1)+(nd_def_1:ncpus=32)+(nd_def_2:ncpus=32)
...
$ pbsnodes -av
nd_abc_1
Mom = nd_abc_1.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 105
resources_assigned.ncpus = 5nd_abc_2
Mom = nd_abc_2.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 105
resources_assigned.ncpus = 5nd_abc_3[0]
Mom = nd_abc_3.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 5
resources_assigned.ncpus = 5nd_abc_3[1]
Mom = nd_abc_3.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 5
resources_assigned.ncpus = 15nd_abc_4_bm
Mom = nd_abc_4_bm.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.bigmem = True
resources_available.model = abc
resources_available.ncpus = 1
resources_assigned.ncpus = 1nd_abc_5_bm
Mom = nd_abc_5_bm.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.bigmem = True
resources_available.model = abc
resources_available.ncpus = 1
resources_assigned.ncpus = 1nd_abc_6_bm
Mom = nd_abc_6_bm.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.bigmem = True
resources_available.model = abc
resources_available.ncpus = 1
resources_assigned.ncpus = 1nd_def_1
Mom = nd_def_1.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = def
resources_available.ncpus = 32
resources_assigned.ncpus = 32nd_def_2
Mom = nd_def_2.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = def
resources_available.ncpus = 32
resources_assigned.ncpus = 32
...
will release the nodes (nd_abc_3[0]:ncpus=5)+(nd_abc_3[1]:ncpus=15)+(nd_abc_6_bm:ncpus=1)+(nd_def_1:ncpus=32)+(nd_def_2:ncpus=32) from the job while retaining the nodes (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1).
...
$ qstat -f 120| egrep exec_vnode
exec_vnode = (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)
- Using Partial Chunk Resource List :
The same result in the previous example can be achieved by using the below shorter select string where the resource list is partial one with respect to the original select supplied to qsub.
$ pbs_release_nodes -j 120 -k select=model=abc+2:bigmem=true
- Errors and Return codes :
- When the command with the new option executes successfully, the below output is put on the console. With exit code set to 0
pbs_release_nodes: <sub select string>
- Cannot be used in conjunction "-a" option. If used so, pbs_release_nodes will print below error along with usage strings.
pbs_release_nodes: -a and -k options cannot be used together
- When the command with the new option executes successfully, the below output is put on the console. With exit code set to 0
- Cannot be used in conjunction with supplying host/vnode list arguments (<vnode> [<vnode> [<vnode>] ...]). If used so, pbs_release_nodes will print below error along with usage strings.
pbs_release_nodes: cannot supply node list with -k option
- <TBD> When the select statement supplied is not a sub statement of the job's qsub select statement<TBD> List all other Failure scenarios, their return code and error messagesargument string to "-k" option doesn't start with "select=" string
pbs_release_nodes: only a "select=" string is valid in -k option
- When the sub select statement supplied contains undefined resources
pbs_release_nodes: Unknown resource: <undefined res name>
- Cannot be used in conjunction with supplying host/vnode list arguments (<vnode> [<vnode> [<vnode>] ...]). If used so, pbs_release_nodes will print below error along with usage strings.
- For all other failures, including non-satisfaction of the sub select string, the below error will get printed
pbs_release_nodes: Server returned error 15010 for job
- For all other failures, including non-satisfaction of the sub select string, the below error will get printed
- Accounting Logs :
- No new accounting logs introduced. See Ref 2. above.
...
- The "select" string parameter will be passed to "pbs_relnodesjob()" using its "extend" argument which is of type "char * "
...
Project Documentation Main Page
...