Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Objective

This is to introduce the node ramp down feature, which basically releases no longer needed  sister nodes early from running jobs.

Interface 1: New command: 'pbs_release_nodes'

  • Visibility: Public
  • Change Control: Stable
  • Synopsis: Release a specified set of sister nodes or vnodes, or all sister nodes or vnodes assigned to the specified job. The nodes released will then be made available for scheduling other jobs.
  • Permission: Only job owner, admin, PBS manager, or PBS operator will be allowed to perform the release nodes action.
  • Details:
    1.  Release a particular set of sister nodes from a job:
      1. Syntax:   pbs_release_nodes -j <job_identifier> <host1_or_vnode1> [<host2_or_vnode2> [<host3_or_vnode3>] ...] ...

        • The 'host*_or_vnode*' argument is any of the sister nodes/vnodes that appear in the exec_vnode attribute of a running job.

        • Example:
          % qsub job.scr
          241.borg
          % qstat  241 | grep "exec|Resource_List|select"
          ...

          exec_host = borg[0]/0*0+federer/0*0+lendl/0*2
          exec_vnode = (borg[0]:mem=1048576kb:ncpus=1+borg[1]:mem=1048576kb:ncpus=1+borg[2]:ncpus=1)+(federer:mem=1048576kb:ncpus=1+federer[0]:mem=1048576k:ncpus=1+federer[1]:ncpus=1)+(lendl:ncpus=2:mem=2097152kb)
          Resource_List.mem = 6gb
          Resource_List.ncpus = 8
          Resource_List.nodect = 3
          Resource_List.place = scatter
          Resource_List.select = ncpus=3:mem=2gb+ncpus=3:mem=2gb+ncpus=2:mem=2gb
          schedselect = 1:ncpus=3:mem=2gb+1:ncpus=3:mem=2gb+1:ncpus=2:mem=2gb


          %  pbs_release_nodes -j 241 federer[1] lendl

          % qstat  241 | grep "exec|Resource_List|select"

          exec_host = borg[0]/0*0+federer/0*0 <- no lendl
          exec_vnode = (borg[0]:mem=1048576kb:ncpus=1+borg[1]:mem=1048576kb:ncpus=1+borg[2]:ncpus=1)+(federer:mem=1048576kb:ncpus=1+federer[0]:mem=1048576kb:ncpus=1) <- federer[1] and lendl removed.

          Resource_List.mem = 4194304kb <- minus 2gb (from lendl)
          Resource_List.ncpus = 5 <- minus 3 cpus (1 from federer[1] and 2 from lendl)
          Resource_List.nodect = 2 <- minus 1 chunk (when lendl was taken out, its entire chunk assignment disappeared)
          Resource_List.place = scatter
          schedselect = 1:mem=2097152kb:ncpus=3+1:mem=2097152kb:ncpus=2

      2.  Error: pbs_release_nodes will report an error if any of the nodes specified are managed by a mother superior mom.

        • Example:

          % pbs_release_nodes -j 241 borg[0]

          pbs_release_nodes: Can't free 'borg[0]' since it's on an MS host

    2. Release all sister nodes from a job:
      1. Syntax:   pbs_release_nodes -j <job_identifier> -a
        • Example:
          % pbs_release_nodes -j 241 -a
          % qstat -f 241

          exec_host = borg[0]/0*0
          exec_vnode = (borg[0]:mem=1048576kb:ncpus=1)+borg[1]:mem=1048576kb:ncpus=1+borg[2]:ncpus=1)
          Resource_List.mem = 2097152kb <- minus 2gb (from lendl)
          Resource_List.ncpus =  3
          Resource_List.nodect = 1
          Resource_List.place = scatter
          schedselect = 1:mem=2097152kb:ncpus=3

Interface 2: New job attribute 'release_nodes_on_stageout'

  • Visibility: Public
  • Change Control: Stable
  • Value: 'true' or 'false'
  • Synopsis: When set to 'true', this will do an equivalent of 'pbs_release_nodes -a' for releasing all the sister vnodes when stageout operation begins.
  • Example:
    %  qsub -W stageout=my_stageout@federer:my_stageout.out -W release_nodes_on_stageout=true job.scr

Interface 3: New server acounting record: 'u' for update record

  • Visibility: Public
  • Change Control: Stable
  • Synopsis: For every release nodes action, there'll be an accounting_logs record written which is called the

    'u' (for update) record. In it will reflect the previous values in exec_*, Resource_List* items, and the updated values in the next_exec_*, next_Resource_List* items.

  • Example:

% qsub -l select=3:ncpus=1:mem=1gb job.scr

242.borg

% qstat -f 242 | egrep "exec|Resource_List|select"

exec_host = borg/0+federer/0+lendl/0

exec_vnode = (borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb)+(lendl:ncpus=1:mem=1048576kb)

Resource_List.mem = 3gb

Resource_List.ncpus = 3

Resource_List.nodect = 3

Resource_List.place = scatter

Resource_List.select = 3:ncpus=1:mem=1gb

schedselect = 3:ncpus=1:mem=1gb

% pbs_release_nodes -j 241 lendl

Accounting logs show:

# tail -f /var/spool/PBS/server_priv/accounting/201701231

23/2017 18:53:24;u;242.borg.user=bayucan group=users project=_pbs_project_default jobname=STDIN queue=workq ctime=1485215572 qtime=1485215572 etime=1485215572 start=1485215572 exec_host=borg/0+federer/0+lendl/0 exec_vnode=(borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb)+(lendl:ncpus=1:mem=1048576kb) Resource_List.mem=3gb Resource_List.ncpus=3 Resource_List.nodect=3 Resource_List.place=scatter Resource_List.select=3:ncpus=1:mem=1gb next_exec_host=borg/0+federer/0 next_exec_vnode=(borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb) next_Resource_List.mem=2097152kb next_Resource_List.ncpus=2 next_Resource_List.nodect=2 next_Resource_List.place=scatter next_Resource_List.select=1:ncpus=1:mem=1048576kb+1:ncpus=1:mem=1048576kb session=7503 run_count=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=4288kb resources_used.ncpus=3 resources_used.vmem=42928kb resources_used.walltime=00:00:26

Another pbs_release_nodes call yield:

% pbs_release_nodes -j 241 federer

# tail -f /var/spool/PBS/server_priv/accounting/201701231

01/23/2017 18:59:35;u;242.borg;user=bayucan group=users project=_pbs_project_default jobname=STDIN queue=workq ctime=1485215949 qtime=1485215949 etime=1485215949 start=1485215949 exec_host=borg/0+federer/0 exec_vnode=(borg[0]:ncpus=1:mem=1048576kb)+(federer:ncpus=1:mem=1048576kb) Resource_List.mem=2097152kb Resource_List.ncpus=2 Resource_List.nodect=2 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=1048576kb+1:ncpus=1:mem=1048576kb next_exec_host=borg/0 next_exec_vnode=(borg[0]:ncpus=1:mem=1048576kb) next_Resource_List.mem=1048576kb next_Resource_List.ncpus=1 next_Resource_List.nodect=1 next_Resource_List.place=scatter next_Resource_List.select=1:ncpus=1:mem=1048576kb session=7773 run_count=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=4300kb resources_used.ncpus=3 resources_used.vmem=42928kb resources_used.walltime=00:00:00

01/23/2017 19:00:00;L;license;floating license hour:3 day:3 month:3 max:10

Interface 4: New job attributes 'Resource_List_orig', 'schedselect_orig', 'resources_used_acct', 'exec_host_acct', and 'exec_vnode_acct'

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: In this feature of releasing nodes early and providing some trace of action in the accounting logs, the *_orig and *_acct internal job attributes are used to save interim data. These attributes could go away in a future release of PBS.


  • No labels