Objective
This is to introduce the node ramp down feature, which basically releases no longer needed sister nodes early from running jobs.
Interface 1: New command: 'pbs_release_nodes'
- Visibility: Public
- Change Control: Stable
- Synopsis: Release a specified set of sister nodes or vnodes, or all sister nodes or vnodes assigned to the specified job. The nodes released will then be made available for scheduling other jobs.
- Permission: Only job owner, admin, PBS manager, or PBS operator will be allowed to perform the release nodes action.
- Details:
- Release a particular set of sister nodes from a job:
Syntax: pbs_release_nodes -j <job_identifier> <host1_or_vnode1> [<host2_or_vnode2> [<host3_or_vnode3>] ...] ...
The 'host*_or_vnode*' argument is any of the sister nodes/vnodes that appear in the exec_vnode attribute of a running job.
- Example:
% qsub job.scr
241.borg
% qstat 241 | grep "exec|Resource_List|select"
...exec_host = borg[0]/0*0+federer/0*0+lendl/0*2
exec_vnode = (borg[0]:mem=1048576kb:ncpus=1+borg[1]:mem=1048576kb:ncpus=1+borg[2]:ncpus=1)+(federer:mem=1048576kb:ncpus=1+federer[0]:mem=1048576k:ncpus=1+federer[1]:ncpus=1)+(lendl:ncpus=2:mem=2097152kb)
Resource_List.mem = 6gb
Resource_List.ncpus = 8
Resource_List.nodect = 3
Resource_List.place = scatter
Resource_List.select = ncpus=3:mem=2gb+ncpus=3:mem=2gb+ncpus=2:mem=2gb
schedselect = 1:ncpus=3:mem=2gb+1:ncpus=3:mem=2gb+1:ncpus=2:mem=2gb
% pbs_release_nodes -j 241 federer[1] lendl
% qstat 241 | grep "exec|Resource_List|select"exec_host = borg[0]/0*0+federer/0*0 <- no lendl
exec_vnode = (borg[0]:mem=1048576kb:ncpus=1+borg[1]:mem=1048576kb:ncpus=1+borg[2]:ncpus=1)+(federer:mem=1048576kb:ncpus=1+federer[0]:mem=1048576kb:ncpus=1) <- federer[1] and lendl removed.Resource_List.mem = 4194304kb <- minus 2gb (from lendl)
Resource_List.ncpus = 5 <- minus 3 cpus (1 from federer[1] and 2 from lendl)
Resource_List.nodect = 2 <- minus 1 chunk (when lendl was taken out, its entire chunk assignment disappeared)
Resource_List.place = scatter
schedselect = 1:mem=2097152kb:ncpus=3+1:mem=2097152kb:ncpus=2
Error: pbs_release_nodes will report an error if any of the nodes specified are managed by a mother superior mom.
Example:
% pbs_release_nodes -j 241 borg[0]
pbs_release_nodes: Can't free 'borg[0]' since it's on an MS host
- Release all sister nodes from a job:
- Syntax: pbs_release_nodes -j <job_identifier> -a
- Example:
% pbs_release_nodes -j 241 -a
% qstat -f 241exec_host = borg[0]/0*0
exec_vnode = (borg[0]:mem=1048576kb:ncpus=1)+borg[1]:mem=1048576kb:ncpus=1+borg[2]:ncpus=1)
Resource_List.mem = 2097152kb <- minus 2gb (from lendl)
Resource_List.ncpus = 3
Resource_List.nodect = 1
Resource_List.place = scatter
schedselect = 1:mem=2097152kb:ncpus=3
- Example:
- Syntax: pbs_release_nodes -j <job_identifier> -a
- Release a particular set of sister nodes from a job:
Interface 2: New job attribute 'release_nodes_on_stageout'
- Visibility: Public
- Change Control: Stable
- Value: 'true' or 'false'
- Synopsis: When set to 'true', this will do an equivalent of 'pbs_release_nodes -a' for releasing all the sister vnodes when stageout operation begins.
Example:
% qsub -W stageout=my_stageout@federer:my_stageout.out -W release_nodes_on_stageout=true job.scr
Interface 3: New server acounting record: 'u' for update record
- Visibility: Public
- Change Control: Stable
- Synopsis: For every release nodes action, there'll be an accounting_logs record written which is called the
'u' (for update) record. In it will reflect the previous values in exec_*, Resource_List* items, and the updated values in the next_exec_*, next_Resource_List* items.
- Example:
% qsub -l select=3:ncpus=1:mem=1gb job.scr
242.borg
% qstat -f 242 | egrep "exec|Resource_List|select"
exec_host = borg/0+federer/0+lendl/0
exec_vnode = (borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb)+(lendl:ncpus=1:mem=1048576kb)
Resource_List.mem = 3gb
Resource_List.ncpus = 3
Resource_List.nodect = 3
Resource_List.place = scatter
Resource_List.select = 3:ncpus=1:mem=1gb
schedselect = 3:ncpus=1:mem=1gb
% pbs_release_nodes -j 241 lendl
Accounting logs show:
# tail -f /var/spool/PBS/server_priv/accounting/201701231
23/2017 18:53:24;u;242.borg.user=bayucan group=users project=_pbs_project_default jobname=STDIN queue=workq ctime=1485215572 qtime=1485215572 etime=1485215572 start=1485215572 exec_host=borg/0+federer/0+lendl/0 exec_vnode=(borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb)+(lendl:ncpus=1:mem=1048576kb) Resource_List.mem=3gb Resource_List.ncpus=3 Resource_List.nodect=3 Resource_List.place=scatter Resource_List.select=3:ncpus=1:mem=1gb next_exec_host=borg/0+federer/0 next_exec_vnode=(borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb) next_Resource_List.mem=2097152kb next_Resource_List.ncpus=2 next_Resource_List.nodect=2 next_Resource_List.place=scatter next_Resource_List.select=1:ncpus=1:mem=1048576kb+1:ncpus=1:mem=1048576kb session=7503 run_count=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=4288kb resources_used.ncpus=3 resources_used.vmem=42928kb resources_used.walltime=00:00:26
Another pbs_release_nodes call yield:
% pbs_release_nodes -j 241 federer
# tail -f /var/spool/PBS/server_priv/accounting/201701231
01/23/2017 18:59:35;u;242.borg;user=bayucan group=users project=_pbs_project_default jobname=STDIN queue=workq ctime=1485215949 qtime=1485215949 etime=1485215949 start=1485215949 exec_host=borg/0+federer/0 exec_vnode=(borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb) Resource_List.mem=2097152kb Resource_List.ncpus=2 Resource_List.nodect=2 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=1048576kb+1:ncpus=1:mem=1048576kb next_exec_host=borg/0 next_exec_vnode=(borg[0]:ncpus=1:mem=1048576kb) next_Resource_List.mem=1048576kb next_Resource_List.ncpus=1 next_Resource_List.nodect=1 next_Resource_List.place=scatter next_Resource_List.select=1:ncpus=1:mem=1048576kb session=7773 run_count=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=4300kb resources_used.ncpus=3 resources_used.vmem=42928kb resources_used.walltime=00:00:00
01/23/2017 19:00:00;L;license;floating license hour:3 day:3 month:3 max:10
Interface 4: New job attributes 'Resource_List_orig', 'schedselect_orig', 'resources_used_acct', 'exec_host_acct', and 'exec_vnode_acct'
- Visibility: Public
- Change Control: Experimental
Synopsis: In this feature of releasing nodes early and providing some trace of action in the accounting logs, the *_orig and *_acct internal job attributes are used to save interim data. These attributes could go away in a future release of PBS.