Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Interface: Server attribute: SVR_ATR_power_ramprate_enable
    • Change Control: Stable
    • Synopsispower_ramprate_enable
    • Details: Server attribute. When set to True, PBS can use power ramp rate limiting feature for Cray platform. 
      • Setting this attribute will also enables attributes  SVR_ATR_node_idle_time and SVR_ATR_max_ramprate_limit. 
      • Once set, un-setting Unset of SVR_ATR_power_ramprate_enable attribute will not unset SVR_ATR_node_idle_time and SVR_ATR_max_ramprate_limit.
      • Format: Boolean 
      • Default: unset
      • Manager has read and write permission and others have read permission.
  • Interface: New server attribute: SVR_ATR_node_idle_time
    • Change Control: Stable
    • Synopsis: node_idle_time
    • Details: This new server attribute will define the minimum idle time for nodes to be considered for power ramp down.
      • Enabled when server attribute power_ramprate_enable is set.
      • The default value is set to 1800 seconds.
      • Manager and Operator has set permission. All have read permission.
      • To modify the default value use qmgr:
        • qmgr -c "set server node_idle_time = <new_value>"
        • <new_value> is the time in seconds and should be a non zero positive number.
  • Interface: New server attribute: SVR_ATR_max_ramprate_limit
    • Change Control: Stable
    • Synopsis: max_ramprate_limit
    • Details: This new server attribute will define the set maximum number of nodes that are allowed to drop to C-6.
      • Enabled when server attribute power_ramprate_enable is set.
      • The default value is set to 5.
      • Manager and Operator has set permission. All have read permission..
      • To modify the default value use qmgr:
        • qmgr -c "set server max_ramprate_limit = <new_value>"
        • <new_value>  should be a non zero positive number.
  • Interface: New node attribute: ND_ATR_power_ramprate_enable
    • Change Control: Stable
    • Synopsispower_ramprate_enable
    • Details: This new node attribute will control if a node can be allowed to participate in ramp rate limiting.
    • The default value is False.
    • Manager has set permission. All have read permission.
    • To modify the default value use qmgr:
      • qmgr -c "set node <node_name> power_ramprate_enable-=True"
  • Interface: New node attribute: ND_ATR_last_busy_time
    • Change Control: Stable
    • Synopsis: last_busy_time
    • Details: This new node attribute will be updated with time stamp at the end of job or reservation.
      • Attribute will be reset when node is powered on.
      • Managers and Operators have read permission.
  • Interface: New node state: sleeping
    • Change Control: Stable
    • Synopsis: sleeping
    • Details: This new node state will be set when nodes are ramp downed by PBS via power ramp rate limiting. Scheduler will be able to schedule jobs and reservations on these nodes with sleeping state. If selected by the scheduler, server will ramp these nodes up when required to run jobs or reservations.
  • Interface: New server periodic pbshook PBS_Power
    • Change Control: Stable
    • Synopsis: PBS_power server periodic pbshook
    • Details: This is a new server periodic event added to pbshook PBS_power and will be enabled when server attribute power_ramprate_limit is set to True.
      • This hook takes list of vnodes to power ramp down the nodes
      • This hook looks for the nodes in sleeping state but are assigned to upcoming jobs or reservations. Once identified such nodes will ramped up.
      • This hook interfaces with vendor power api's through generic PMI interface to power ramp down the nodes.
  • Interface: Log/Error messages.
    • Change Control: Stable
    • Synopsis: New log/error messages.
    • Details: Below listed are the new log and error messages introduced by power ramp limiting feature.


      #ScenarioLog/error message
      1Enable power_ramprate_enable server attribute

      In server logs:

      attributes set: power_ramprate_enable = 1

      Log level: LOG_INFO

      2Nodes are being ramped down

      In server logs:

      Cray: init

      Cray: connect

      Cray: ramping down the node

      Job;power_ramp_down;launch: /opt/cray/capmc/default/bin/capmc set_sleep_state_limit --nids 24-25 --limit 4

      Job;power_ramp_down;launch: finished

      Cray: disconnect

      Log level: LOG_INFO

      3Nodes are being ramped down

      In server logs:

      Cray: init

      Cray: connect

      Cray: ramping up the node

      Job;power_ramp_up;launch: /opt/cray/capmc/default/bin/capmc set_sleep_state_limit --nids 24-25 --limit 0

      Job;power_ramp_up;launch: finished

      Cray: disconnect

      Log level: LOG_INFO

      4Server periodic hook output

      In server logs:

      power_ramp_limit: Identified nodes for ramp up: <node_list>

      power_ramp_limit: Identified nodes for ramp down: <node_list>

      Log level: LOG_INFO