Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated server attribute power_provisioning details.

Forum discussion link :http://community.pbspro.org/t/external-design-document-for-pp-824-cray-ramp-rate-limiting/693

Interface design:

  • Interface 1: Server attribute: power_ramprate_enableChanges to Power hook.
    • Change Controlcontrol:  StableStable
    • Synopsis: power_ramprate_enable Power hook PBS_power
    • Details:
    • Server attribute. When set to True, PBS can use power ramp rate limiting feature for Cray platform. 
    • Setting this attribute will also enables attributes  node_idle_time and max_ramprate_limit. 
    • Unset of power_ramprate_enable attribute will not unset node_idle_time and max_ramprate_limit.
    • PBS type: Boolean 
    • Default: unset
    • Python type: bool
    • Manager has read and write permission and others have read permission.
    • A pbshook PBS_Power get enabled as well when this attribute is set. 
    • Example:
      qmgr -c "set server power_ramprate_enable = True"
      qmgr -c "set server power_ramprate_enable = 0"
      At hook: s=pbs.server(); print s.power_ramprate_enable
  • Interface 2: New server attribute: node_idle_time
    • Change Control: Stable
    • Synopsis: node_idle_time
    • Details: This new server attribute will define the minimum idle time for nodes to be considered for power ramp down.
      • Enabled when server attribute power_ramprate_enable is set.
      • The default value is set to 1800 seconds.
      • Manager and Operator has set permission. All have read permission.
      • To modify the default value use qmgr:
        • qmgr -c "set server node_idle_time = <new_value>"
        • <new_value> is the time in seconds and should be a non zero positive number.
      • PBS type: long
      • Python type: int
      • Example:
        qmgr -c "set server node_idle_time = 2000"
        At hook: s=pbs.server(); print s.node_idle_time
  • Interface 3: New server attribute: max_ramprate_limit
    • Change Control: Stable
    • Synopsis: max_ramprate_limit
    • Details: This new server attribute will define the set maximum number of nodes that are allowed to drop to C-6 (least possible sleep state).
    • Enabled when server attribute power_ramprate_enable is set.
    • The default value is set to 5.
    • Manager and Operator Enabling/disabling pbshook PBS_power will enable/disable power management feature.
      • Through qmgr changing hook attribute "enabled" would enable or disable the hook and power management feature.
      • This hook will use server periodic event to ramp up/down and power on/off nodes.
      • Example:
        • qmgr -c "s pbshook PBS_power enabled=true"
        • qmgr -c "s pbshook PBS_power enabled=false"
      • This change overrides existing way to enable power feature i.e. setting server attribute power_provisioning.
        • Server attribute power_provisioning is now read-only and gets enabled or disabled when power hook is enabled or disabled.
  • Interface 2: Hook configuration file 
    • Change Control: Stable
    • Synopsis: JSON hook config file
    • Details: The configuration file allows the administrator to set global parameters, which are used to modify the behavior of the PBS_power hook across all nodes in the PBS Pro complex. The file must conform to JSON syntax. A sample configuration file is displayed and described below:

      {

      "power_ramp_rate_enable": "True",

      "power_on_off_enable": "False",

      "node_idle_limit": "1000",

      "min_node_down_delay": "600",

      "max_jobs_analyze_limit": "80"

      }

    • Parameters:

Parameter Name

Default value

Description

power_ramp_rate_enableFalse

Enabling would make PBS perform ramp rate limiting across the PBS cluster running on a CRAY CLE 6.0 platform.

Nodes will be ramped-up and kept at sleep state C1 and for ramp down nodes will be put to sleep state C6.

power_on_off_enableFalse

Enabling would make PBS power on and off nodes on the nodes where node attribute poweroff_eligible is true.

node_idle_limit1800How long any node should be left idle before it to be considered for powering down or ramp down.
min_node_down_delay1800The time limit before a powered-off node can be considered to be brought up.
max_jobs_analyze_limit100

The limit indicating maximum number of jobs that are analyzed for power on/ramp-up. The jobs considered here are those which have

estimated start_time and exec_vnode updated on them. To have these attributes updated one should have strict_ordering set to true and

submit jobs with walltime.

max_concurrent_nodes5

Defines how many nodes can be power on/off or ramped up/down at a time.

For ramp rate, while stepping up or down sleep states, hook will sleep X seconds (where 1<=X<=10) between each level of sleep state.

If a node supports 5 levels of sleep states, in worst case scenario hook can wait for 50 seconds for single node. So while increasing the value of

this attribute one should also consider increasing the PBS_Power hook frequency and alarm time so that hook instances do not overlap or timeout.

  • Interface 3: New node attribute: poweroff_eligible
    • Change control: Stable
    • Synopsis: Node attribute for power control.
    • Details: This new node attribute will control if a node can be allowed to power off or not.
      • PBS type: Boolean
      • Python type: Boolean
      • Default value: False
      • Manager has set permission. All have read permission..
      • To modify the default value use qmgr:
        • qmgr -c "set server maxnode <node_rampratename> poweroff_limit eligible= <new_value>True"<new_value>  should be a non zero positive number
  • Interface 4: New node attribute: last_state_change_time
    • Change control: Stable
    • Synopsis: Read only node attribute to capture timestamp.
    • Details: This new node attribute will be updated with time stamp when the node changes from its current state to a new state.
      • Managers and Operators have read permission.
      • Node status command pbsnodes will convert internal date format (seconds since epoch) to human readable format and display the value of this attribute in “MON DD YY HH:MM:SS” format.
      • PBS type: long
      • Python type: int
      • Example:
        qmgr -c "set server max_ramprate_limit = 20"
        At hook: s=pbs.server(); print s.max_ramprate_limit
    Interface 4: DELETED
  • Interface 5: New node attribute: last_used_time
    • Change Control: Stable
    • Synopsis: last_used_timeRead only node attribute to capture timestamp.
    • Details: This new node attribute will be updated with time stamp at the end of any job or reservation.
      • If node is released early from a running job this timestamp gets updated.
      • Node status command pbsnodes will convert internal date format (seconds since epoch) to human readable format and display the value of this attribute in "MON DD YY HH:MM:SS" format.
      • Attribute will be reset when node is ramped up.
      • Managers and Operators have read permission.
      • For new vnodes this attribute will be updated for the first time with the current timestamp when power_ramprate_enable is set for that particular node.If node attribute power_ramprate_enable is unset for a node previously and set again, current timestamp is updated for last_used_time attributethey are created or when the nodes are rebooted.
      • This attribute can now be used in sched_config as a node_sort_key. This will hep help sort the nodes based on their last used time.
      • PBS type: long
      • Python type: int
      • Example:
        • node_sort_key: "last_used_time HIGH"
        • node_sort_key: "last_used_time LOW"
  • Interface 6: New node state: asleep sleep
    • Change Control: Stable
    • Synopsis: asleepNew node state that shows node is put down by PBS.
    • Details: This new node state will be set when nodes are ramped down or powered-off by PBS via power ramp rate limiting . Scheduler will be able to schedule jobs and reservations on these nodes with "asleep" state. If selected by the scheduler, server will ramp these nodes up when required to run jobs or for reservations.or power on/off feature.
      • A server periodic hook (pbs hook PBS_power provided as part of PBS package) runs every $freq seconds and takes list of vnodes to power ramp down/power-off the nodes and marks them in new asleep sleep node state.
      • At most $max_rampratemost max_concurrent_power_limit nodes will be ramped down/powered-off every $freq seconds.
  • Interface 7: New node state: ramp-up
    • Change Control: Stable
    • Synopsis: ramp-up
    • Details: This new node state will be set when nodes are being ramped up by PBS via power ramp rate limiting.
      • New server side hook power_provisioning will take node in asleep state but are assigned to upcoming jobs or reservations and ramps it up.
      • While nodes are being ramped up through this this hook, node is marked with this new state "ramp-up".
      • This hook interfaces with vendor power api's through generic PMI interface to power ramp up the nodes.
  • Interface 8: New server hook event: power_provision
    • Change control: Stable
    • Synopsis: Server hook event power_provision
    • Details: This is a new server side hook event used for power related provisioning. 
      • Hook will have access to name of vnode to be provisioned. Hook will provision one node at a time.
      • This hook takes names of only those nodes in asleep state but are assigned to upcoming jobs or reservations and ramps them up.
      • For a job or reservation if there are nodes more than max_ramprate_limit to be ramped up, at a time maximum max_ramprate_limit nodes will be ramped up in anticipation of use. Once the nodes are provisioned next nodes in queue to be provisioned are considered.
      • If there are any issues during provisioning such nodes are marked offline.
      • This hook interfaces with vendor power api's through generic PMI interface to power ramp down the nodes.
  • Interface 9
      • freq seconds, freq being the server periodic hook frequency.
      • Scheduler can consider the nodes in sleep state to run jobs now.
      • Server periodic hook can ramp-up/power-on the nodes which are in sleep state based on the requirement. Requirement is calculated based on analyzing the jobs estimated start time and the exec_vnode list.
  • Interface 7: Log/Error messages.
    • Change Control: Stable Unstable
    • Synopsis: New log/error messages.
    • Details: Below listed are the new log and error messages introduced by power ramp limiting feature.

      #ScenarioLog/error message
      1Enable power_ramprate_enable server attribute

      In server logs:

      attributes set: power_ramprate_enable = 1

      Log level: LOG_INFO
      2Nodes are being ramped down

      In server logs:

      Job;power_ramp_down;launch: /opt/cray/capmc/default/bin/capmc set_sleep_state_limit --nids 24-25 --limit 4

      Job;power_ramp_down;launch: finished

      Log level: LOG_INFO

      32Nodes are being ramped up

      In server logs:

      Job;power_ramp_up;launch: /opt/cray/capmc/default/bin/capmc set_sleep_state_limit --nids 24-25 --limit 0

      Job;power_ramp_up;launch: finished

      Log level: LOG_INFO

      43Server periodic hook output

      In server logs:

      power_ramp_limit: nodes to ramp up: <node_list>

      power_ramp_limit: nodes to ramp down: <node_list>

      Log level: LOG_INFO

      4Nodes are being powered off

      In server logs:


      03/29/2016 02:05:59;0008;Server@sdb;Job;node_power_off;launch: /opt/cray/capmc/default/bin/capmc node_off --nids 24-25

      03/29/2016 02:06:01;0008;Server@sdb;Job;node_power_off;launch: finished

      Log level: LOG_INFO

      5Nodes are being powered on

      In server logs:


      03/29/2016 02:05:59;0008;Server@sdb;Job;node_power_on;launch: /opt/cray/capmc/default/bin/capmc node_on --nids 24-25

      03/29/2016 02:06:01;0008;Server@sdb;Job;node_power_on;launch: finished

      Log level: LOG_INFO