Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

(warning) Still under construction (warning)

Objective:

To integrate PBS with Linux cgroup capabilities

Interface 1: cgroup_prefix configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding cgroup_prefix as a valid cgroup hook configuration variable
  • Details: cgroup_prefix is defined in the cgroup config file. It allows the admin to name the directory where all of the cgroup directories for PBS jobs will be placed (i.e. /sys/fs/cgroup/cpuset/<cgroup_prefix>).

Interface 2: periodic_resc_update configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding periodic_resc_update as a valid cgroup hook configuration variable
  • Details: periodic_resc_update is defined in the cgroup config file. It allows the admin enable the cgroup hook to update the resources_used values for cput, mem, and vmem. Valid values are true/false. Default value is true.

Interface 3: exclude_hosts configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding exclude_hosts as a valid cgroup hook configuration variable
  • Details: exclude_hosts is defined in the cgroup config file. It allows the admin exclude certain hosts from running the cgroups hooks. Valid values are any host name managed by the pbs server.

Interface 4: exclude_vntypes configuration variable

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding exclude_vntypes as a valid cgroup hook configuration variable
  • Details: exclude_vntypes is defined in the cgroup config file. It allows the admin exclude certain vntypes from running the cgroups hooks. Valid values are any string that the admin places on the first line in a file named vntype located in PBS_HOME/mom_priv.

Interface 5: run_only_on_hosts configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding run_only_on_hosts as a valid cgroup hook configuration variable
  • Details: run_only_on_hosts is defined in the cgroup config file. It allows the admin allow the cgroups hook to only run on a certain set of hosts. Valid values are any host name managed by the pbs server.

Interface 6: vnode_per_numa_node configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding vnode_per_numa_node as a valid cgroup hook configuration variable
  • Details: vnode_per_numa_node is defined in the cgroup config file. It allows the admin allow to create individual vnodes per numa node. On a two socket system it creates two additional vnodes and assigns the resources of each numa node to the vnode. It also sets the resources managed by the parent vnode to zero. Valid values are true/false.

Interface 7: online_offlined_nodes configuration variable

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding online_offlined_nodes as a valid cgroup hook configuration variable
  • Details: online_offlined_nodes is defined in the cgroup config file. It allows the cgroup hook to online nodes that were offlined by the cgroup hook due to orphan cgroups not cleaning up. Valid values are true or false.

Interface 8: cgroup configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding cgroup as a valid cgroup hook configuration variable
  • Details: cgroup is defined in the cgroup config file. It allows the admin to specify which subsystems PBS will use for the job. Valid values are cpuacct, cpuset, devices, hugetlb (where supported), memory, and memsw.

Interface 8-1: cpuacct configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding cpuacct as a valid key in the cgroup section in the cgroup hook configuration variable
  • Details: cpuacct is defined in the cgroup variable in the config file. It allows the admin to use the cpuacct subsystem, which tracks the cput of all of the pids assigned to the cgroup. Valid keys in the cpuacct subsystem are enabled (valid options true/false), exclude_hosts (see interface 3), exclude_vntypes (see interface 4).

Interface 8-2: cpuset configuration variable

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding cpuset as a valid key in the cgroup section in the cgroup hook configuration variable
  • Details: cpuset is defined in the cgroup variable in the config file. It allows the admin to use the cpuset subsystem, which assigns the cores and memory socket(s) for use by all of the pids assigned to the cgroup. Valid keys in the cpuset subsystem are enabled (valid options true/false), exclude_hosts (see interface 3), exclude_vntypes (see interface 4).

Interface 8-3: devices configuration variable (No ptl tests yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding cpuset as a valid key in the cgroup section in the cgroup hook configuration variable
  • Details: devices is defined in the cgroup variable in the config file. It allows the admin to use the devices subsystem, which assigns the devices for use by all of the pids assigned to the cgroup. Valid keys in the devices subsystem are enabled (valid options true/false), exclude_hosts (see interface 3), exclude_vntypes (see interface 4), allow (list of devices to allow access to).

Interface 8-3a: allow configuration variable (No ptl tests yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding allow as a valid devices key in the cgroup section in the cgroup hook configuration variable
  • Details: allow is defined in the devices section in the cgroup section in the config file. It allows the admin to use the devices subsystem, which allows access to the listed devices for use by all of the pids assigned to the cgroup. Valid ways to reference allowable devices are as follows
    • "b *:* rwm" (This exact string will be used in the allowed string)
    • ["mic/scif","rwm"] (This will look for the major and minor number of the mic/scif device and set it to rwm (i.e. if /dev/mic reported "crw-rw-rw- 1 root root 244, 1 Mar 30 14:50 scif" then the line added to the allow file would look like "c 244:1 rwm"))
    • ["nvidiactl","rwm", "*"] (This will look for the major number of the nvidiactl device and set it to rwm (i.e. if /dev/nvidiactl reported "crw-rw-rw- 1 root root 284, 1 Mar 30 14:50 nvidiactl" then the line added to the allow file would look like "c 284:* rwm"))

Interface 8-4: hugetlb configuration variable (No ptl tests yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding hugetlb as a valid key in the cgroup section in the cgroup hook configuration variable
  • Details: hugetlb is defined in the cgroup variable in the config file. It allows the admin to use the hugetlb subsystem, which allows access to the hugetlb memory. Valid keys in the devices subsystem are enabled (valid options true/false), exclude_hosts (see interface 3), exclude_vntypes (see interface 4).

Interface 8-5: memory configuration variable

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding memory as a valid key in the cgroup section in the cgroup hook configuration variable
  • Details: memory is defined in the cgroup variable in the config file. It allows the admin to use the memory subsystem, which allows the admin to monitory and limit memory used by all of the pids assigned to the cgroup. Valid keys in the memory subsystem are enabled (valid options true/false), default (memory assigned if the job did not request any), reserve_memory (memory to reserve for processes outside of PBS), exclude_hosts (see interface 3), exclude_vntypes (see interface 4).

Interface 8-6: memsw configuration variable (No ptl test yet)

  • Visibility: Public
  • Change Control: Experimental
  • Synopsis: Adding memsw as a valid key in the cgroup section in the cgroup hook configuration variable
  • Details: memsw is defined in the cgroup variable in the config file. It allows the admin to use the memsw subsystem, which allows the admin to monitory and limit swap used by all of the pids assigned to the cgroup. Valid keys in the memsw subsystem are enabled (valid options true/false), default (memory assigned if the job did not request any), reserve_memory (memory to reserve for processes outside of PBS), exclude_hosts (see interface 3), exclude_vntypes (see interface 4).

Setup:

  • Run the following commands in qmgr
    • create hook cgroups
    • set hook cgroups event = "execjob_begin,execjob_launch,execjob_attach,execjob_epilogue,execjob_end,exechost_startup,exechost_periodic"
    • set hook cgroups freq = 120
    • set hook fail_action = offline_vnodes
    • import hook cgroups application/x-python default cgroups.py
    • import hook cgroups application/x-config default cgroups.json
  • No labels