Follows the PBS Pro Design Document Guidelines.

Links

Overview

This is to enhance the "node ramp down" feature, by introducing a new option "-k <select>" ("k" for "keep") to the pbs command "pbs_release_nodes". This will allow users or admins to retain some of the sister nodes/vnodes which satisfy the "select" argument, while performing node ramp down operation.

Technical Details

Interface 1:  -k <select statement>

Details:

pbs_release_nodes [-j <job ID>] <vnode> [<vnode> [<vnode>] ...]
pbs_release_nodes [-j <job ID>] -a
pbs_release_nodes [-j <job ID>]  -k  <select statement>
pbs_release_nodes --version 
 

$ qsub -l select=4:model=abc:ncpus=5+3:model=abc:bigmem=true:ncpus=1+2:model=def:ncpus=32  job.scr
120.pbssrv

Now grepping for assigned vnodes we may see :

$ qstat -f 120| egrep exec_vnode
exec_vnode = (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_3[0]:ncpus=5)+(nd_abc_3[1]:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)+(nd_abc_6_bm:ncpus=1)+(nd_def_1:ncpus=32)+(nd_def_2:ncpus=32)

Here the first chunk "(nd_abc_1:ncpus=5)" represents the mother superior node while each of the remaining others represent a sister node.

and node statuses as :

$ pbsnodes -av
nd_abc_1
    Mom = nd_abc_1.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.model = abc
    resources_available.ncpus = 5
    resources_assigned.ncpus = 5

nd_abc_2
    Mom = nd_abc_2.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.model = abc
    resources_available.ncpus = 5
    resources_assigned.ncpus = 5

nd_abc_3[0]
    Mom = nd_abc_3.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.model = abc
    resources_available.ncpus = 5
    resources_assigned.ncpus = 5

nd_abc_3[1]
    Mom = nd_abc_3.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.model = abc
    resources_available.ncpus = 5
    resources_assigned.ncpus = 5

nd_abc_4_bm
    Mom = nd_abc_4_bm.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.bigmem = True
    resources_available.model = abc
    resources_available.ncpus = 1
    resources_assigned.ncpus = 1

nd_abc_5_bm
    Mom = nd_abc_5_bm.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.bigmem = True
    resources_available.model = abc
    resources_available.ncpus = 1
    resources_assigned.ncpus = 1

nd_abc_6_bm
    Mom = nd_abc_6_bm.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.bigmem = True
    resources_available.model = abc
    resources_available.ncpus = 1
    resources_assigned.ncpus = 1

nd_def_1
    Mom = nd_def_1.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.model = def
    resources_available.ncpus = 32
    resources_assigned.ncpus = 32

nd_def_2
    Mom = nd_def_2.pbspro.com
    state = job-busy
    jobs = 120.pbssrv/0
    resources_available.model = def
    resources_available.ncpus = 32
    resources_assigned.ncpus = 32

Now if we do a pbs_release_nodes with the new "-k" option having a select argument which is a sub statement of select string used in qsub -l :

$ pbs_release_nodes -j 120 -k select=model=abc:ncpus=5+2:model=abc:bigmem=true:ncpus=1

will release the nodes (nd_abc_3[0]:ncpus=5)+(nd_abc_3[1]:ncpus=5)+(nd_abc_6_bm:ncpus=1)+(nd_def_1:ncpus=32)+(nd_def_2:ncpus=32) from the job while retaining the nodes (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1).

The new phase of the job will have below vnodes associated with it 

$ qstat -f 120| egrep exec_vnode
exec_vnode = (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)

The same result in the previous example can be achieved by using the below shorter select string where the resource list is partial one with respect to the original select supplied to qsub.

$ pbs_release_nodes -j 120 -k select=model=abc+2:bigmem=true


API level details:





OSS Site Map

Project Documentation Main Page

Developer Guide Pages