EDD PP-706: Automatically create KNL specific information

PBSPro Community Discussion: http://community.pbspro.org/t/pp-706-automatically-create-knl-specific-information/678


  • Interface 1: vnode attribute: resources_available.vntype
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • The vntype attribute, for vnodes corresponding to Cray KNL nodes, will have the same value as non knl compute node i.e "cray_compute".
      • Node selection for bootable processor node will happen on the basis of aoe.

  • Interface 2: vnode attribute: resources_available.PBScrayseg
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • KNL vnodes, corresponding to KNL nodes returned as part of the System (BASIL 1.7) Query XML response, will have PBScrayseg set to 0, when vnode_per_numa node is true. There will only be one KNL vnode per KNL node, regardless of the number of segments/NUMA nodes, per KNL node, specified in the XML response.
      • This is a change from existing behavior.

  • Interface 3: vnode attribute: current_aoe
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • KNL vnodes, corresponding to KNL nodes returned as part of the System (BASIL 1.7) Query XML response, will have current_aoe set to the numa_cfg value concatenated with the hbm_cache_pct value e.g. a2a_0.
      • Each KNL vnode’s current_aoe attribute shows that vnode’s current AOE.
      • The valid values are: a2a, snc2, snc4, hemi, quad for numa_cfg and 0, 25, 50, 100 for hbm_cache_pct.

  • Interface 4: vnode attribute: resources_available.hbmem
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • KNL vnodes, corresponding to KNL nodes returned as part of the System (BASIL 1.7) Query XML response, will have hbmem set to the hbm_size_mb value.
      • It is a host level consumable resource.


  • Interface 5: System Query (BASIL 1.7)
      • Visibility: PBS Private
      • Change Control: Stable
      • Details:
        • The System Query (BASIL 1.7) reports inventory information in a much more compact form than the Inventory Query. Attribute value pairs in the XML response
          apply to a group of Nodes.

  • Interface 6: sched config
      • Visibility: Public
      • Change Control: Stable
      • Details:
        • New default resource 'hbmem' will be added in scheduler config for cray platform.
        • This will allow scheduler to schedule jobs based on hbmem resource request.
  • Interface 7: Log/Error messages.
    • Visibility: Public
    • Change Control: Stable
    • Details: The following table lists the log/error messages introduced in the KNL (BASIL 1.7) project.

      No.

      Level

      Log/Error message

      Visibility

      Classification

      1PBSEVENT_SYSTEMIn MoM logs: Memory allocation for XML request buffer failed.PublicStable
      2PBSEVENT_DEBUG4In MoM logs: This Cray system supports the BASIL 1.7 protocol.PublicStable
      3PBSEVENT_DEBUG4In MoM logs: This Cray system does not support the BASIL 1.7 protocol.PublicUnstable
      4PBSEVENT_SYSTEMIn MoM logs: ALPS System Query request failed.PublicStable
      5PBSEVENT_DEBUGIn MoM logs: Creation of Cray KNL vnodes failed with name <vnode name>PublicStable
      6PBSEVENT_DEBUG3In MoM logs: No KNL nodes.PublicStable
      7PBSEVENT_ERRORIn MoM logs: Bad KNL Rangelist: <rangelist>PublicUnstable
      8PBSEVENT_ERRORIn MoM logs: malloc failurePublicUnstable
      9PBSEVENT_ERRORIn MoM logs: realloc failurePublicUnstable

      .     

  • Interface 8: PBS hook PBS_xeon_phi_provision
    • Visibility: Public
    • Change Control: Stable
    • Synopsis: Xeon phi provisioning
    • Details:
      • This provisioning hook script runs on the server to provision the node with requested aoe.
      • If the requested aoe is already set as current_aoe on the available node then provisionig hook will not be invoked.
      • This will be invoked whenever aoe resource is requested in the job.
      • Time out for the hook is 1800 seconds.
      • Reference to more detail on the interface.
        • The PBS_xeon_phi_provision enabled is boolean hook attribute will be unset by default, be visible to all and changeable by a manager.
        • Use qmgr to set the PBS_xeon_phi_provision enabled true or false. For example:
          Example
          qmgr -c “set pbshook PBS_xeon_phi_provision enabled = true”