Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated server attribute power_provisioning details.

Forum discussion link :http://community.pbspro.org/t/external-design-document-for-pp-824-cray-ramp-rate-limiting/693

...

  • Interface 1: Changes to Power hook.
    • Change control: Stable
    • Synopsis: Power hook PBS_power
    • Details: Enabling/disabling pbshook PBS_power will enable/disable power management feature.
      • Through qmgr changing hook attribute "enabled" would enable or disable the hook and power management feature.
      • This hook will use server periodic event to ramp up/down and power on/off nodes.
      • Example:
        • qmgr -c "s pbshook PBS_power enabled=true"
        • qmgr -c "s pbshook PBS_power enabled=false"
      • This change overrides existing way to enable power feature i.e. setting server attribute power_provisioning.
        • Server attribute power_provisioning is now read-only and gets enabled or disabled when power hook is enabled or disabled.
  • Interface 2: Hook configuration file 
    • Change Control: Stable
    • Synopsis: JSON hook config file
    • Details: The configuration file allows the administrator to set global parameters, which are used to modify the behavior of the PBS_power hook across all nodes in the PBS Pro complex. The file must conform to JSON syntax. A sample configuration file is displayed and described below:

      {

      "power_ramp_rate_enable": "True",

      "power_on_off_enable": "False",

      "node_idle_limit": "1000",

      "min_node_down_delay": "600",

      "max_jobs_peranalyze_queue_limit": "80"

      }

    • Parameters:

Parameter Name

Default value

Description

power_ramp_rate_enableFalse

Enabling would make PBS perform ramp rate limiting across the PBS cluster running on a CRAY CLE 6.0 platform.

Nodes will be ramped-up and kept at sleep state C1 and for ramp down nodes will be put to sleep state C6.

power_on_off_enableFalse

Enabling would make PBS power on and off nodes on the nodes where node attribute poweroff_eligible is true.

node_idle_limit1800How long any node should be left idle before it to be considered for powering down or ramp down.
min_node_down_delay1800The time limit before a powered-off node can be considered to be brought up.
max_jobs_per_queueanalyze_limit100Queue level

The limit indicating maximum number of

queued

jobs that are analyzed for power on/ramp-up

in each queue

. The jobs considered here are those which have

estimated start_time and exec_vnode updated on them. To have these attributes updated one should have strict_ordering set to true and

submit jobs with walltime.

max_concurrent_nodes105

Defines how many nodes can be power on/off or ramped up/down at a time.

For ramp rate, while stepping up or down sleep states, hook will sleep X seconds (where 1<=X<=10) between each level of sleep state.

If a node supports 5 levels of sleep states, in worst case scenario hook can wait for 50 seconds for single node. So while increasing the value of

this attribute one should also consider increasing the PBS_Power hook frequency and alarm time so that hook instances do not overlap or timeout.

  • Interface 3: New node attribute: poweroff_eligible
    • Change control: Stable
    • Synopsis: Node attribute for power control.
    • Details: This new node attribute will control if a node can be allowed to power off or not.
      • PBS type: Boolean
      • Python type: Boolean
      • Default value: False
      • Manager has set permission. All have read permission.
      • To modify the default value use qmgr:
        • qmgr -c "set node <node_name> poweroff_eligible=True"
  • Interface 4: New node attribute: last_state_change_time
    • Change control: Stable
    • Synopsis: Read only node attribute to capture timestamp.
    • Details: This new node attribute will be updated with time stamp when the node changes from its current state to a new state.
      • Managers and Operators have read permission.
      • Node status command pbsnodes will convert internal date format (seconds since epoch) to human readable format and display the value of this attribute in “MON DD YY HH:MM:SS” format.
      • PBS type: long
      • Python type: int
  • Interface 5: New node attribute: last_used_time
    • Change Control: Stable
    • Synopsis: Read only node attribute to capture timestamp.
    • Details: This new node attribute will be updated with time stamp at the end of any job or reservation.
      • If node is released early from a running job this timestamp gets updated.
      • Node status command pbsnodes will convert internal date format (seconds since epoch) to human readable format and display the value of this attribute in "MON DD YY HH:MM:SS" format.
      • Attribute will be reset when node is ramped up.
      • Managers and Operators have read permission.
      • For vnodes this attribute will be updated for the first time with the current timestamp when they are created or when the nodes are rebooted.
      • This attribute can now be used in sched_config as a node_sort_key. This will help sort the nodes based on their last used time.
      • PBS type: long
      • Python type: int
      • Example:
        • node_sort_key: "last_used_time HIGH"
        • node_sort_key: "last_used_time LOW"
  • Interface 6: New node state: sleep
    • Change Control: Stable
    • Synopsis: New node state that shows node is put down by PBS.
    • Details: This new node state will be set when nodes are ramped down or powered-off by PBS via power ramp rate limiting or power on/off feature.
      • A server periodic hook (pbs hook PBS_power provided as part of PBS package) runs every $freq seconds and takes list of vnodes to power ramp down/power-off the nodes and marks them in new sleep node state.
      • At most max_concurrent_power_limit nodes will be ramped down/powered-off every freq seconds, freq being the server periodic hook frequency.
      • Scheduler can consider the nodes in sleep state to run jobs now.
      • Server periodic hook can ramp-up/power-on the nodes which are in sleep state based on the requirement. Requirement is calculated based on analyzing the jobs estimated start time and the exec_vnode list.

...