Design document


Architecture Design

Synopsis

As part of the BASIL 1.7 project, we will be making a System Query to get KNL Node information. One vnode per KNL node will be created using this information.

In the current system, we make an Inventory (BASIL 1.4) Query to get Inventory information. Vnodes are created for compute nodes based on this information.

We are currently at BASIL 1.4. For BASIL 1.5 and 1.6, changes have not been implemented in PBS. This project aims to support the BASIL 1.7 System Query for KNL nodes only.

At some point in the future, we may migrate from the the existing Inventory (BASIL 1.4) Query and implement the Inventory (BASIL 1.7) Query.

Definitions:

  • A Node may have one or more Microprocessors/Sockets.
  • Each Socket may have one or more Segments (also known as NUMA Nodes).
  • Each Segment may have one or more CPUs.

In the current system, for non-KNL nodes returned as part of the Inventory (BASIL 1.4) Query, we create a vnode per Segment( vnode_per_numa_node=True). The PBScrayseg attribute of the created vnode will reflect the segment ordinal e.g.

for ordinal=0, PBScrayseg=0, for ordinal=1, PBScrayseg=1 etc.

In the current system, for KNL nodes returned as part of the Inventory (BASIL 1.4) Query, we create 1 vnode per Node (the segment ordinal for KNL Nodes is set to 0 & PBScrayseg = 0).

The System (BASIL 1.7) Query returns grouped information (attributes) that apply to a range of Nodes.

The numa_nodes attribute will reflect the number of NUMA Nodes/Segments this KNL Node has. Regardless, we will be creating 1 vnode per KNL Node.

Additional attributes such as numa_cfg, hbm_cache_pct and hbm_size_mb will also be considered when creating KNL vnodes.

 

Current behavior

PBS makes an INVENTORY Query request (using BASIL 1.4).

The Query response (from ALPS) is an XML representation of Compute Nodes.

Flow of control

New behavior

PBS will make a SYSTEM Query request (using BASIL 1.7) to collect information on KNL Nodes.

The Query response from ALPS will be an XML representation of KNL Nodes.

This XML Response will be parsed & appropriate structures populated.

PBS then makes an INVENTORY Query request (using BASIL 1.4).

The Query response (from ALPS) is an XML representation of Compute Nodes.

Non-KNL vnodes will be created using this information. KNL nodes will be filtered using the earlier fetched System BASIL 1.7 information on KNL nodes.

Subsequently, KNL vnodes (using the earlier fetched System BASIL 1.7 information) will be created.

Flow of control


Filtering KNL Node IDs during non-KNL vnode creation.

In functions called from alps_system_KNL(), KNL Node IDs are extracted from each Node group in the System BASIL 1.7 XML response and accummulated in a buffer for later use.

KNL Node IDs in this buffer will then be excluded from vnode creation in inventory_to_vnodes() (which creates non-KNL vnodes only, using information from the Inventory 1.4 response which includes KNL Node IDs).

Subsequently, KNL vnodes are created in system_to_vnodes_KNL().


basil.h header file.

    • This is a Cray-supplied header file.
    • It has macro definitions corresponding to the XML Response Elements & their Attributes.
      • Examples of XML Elements (for the SYSTEM Query).
      • BASIL_ELM_SYSTEM        "System"
      • BASIL_ELM_NODES         "Nodes"
      • Examples of XML Element Attributes (for the SYSTEM Query).
      • BASIL_ATR_NUMA_NODES "numa_nodes"
      • BASIL_ATR_COMPUTE_UNITS "compute_units"
      • BASIL_ATR_CPUS_PER_CU   "cpus_per_cu"
      • BASIL_ATR_HBMSIZE       "hbm_size_mb"
      • BASIL_ATR_HBM_CFG       "hbm_cache_pct"
    • It also has definitions of structures that we populate during XML Parsing.
      • The Structure used to store information per <Nodes> element (parsed from the SYSTEM Query XML Response) is: basil_system_element_t.
    • This file has information pertaining to BASIL 1.5, 1.6 & 1.7.
    • Since we are only concerned with a BASIL 1.7 feature i.e. the System Query, in this Project, Macro/Structure definitions will be selectively taken from this file & incorporated into basil.h. A few new definitions, as needed, will be added.
    • It was decided not to use the Cray-supplied header file as-is, since they have made some changes to it that will, if used in its entirety, break existing Inventory (BASIL 1.4) functionality.
    • Moreover, this file also has BASIL 1.5 & 1.6 definitions that are not currently supported in PBS Cray code; hence importing those additions into the existing basil.h could lead to confusion.
    • Comments documenting some of the above information will be included in the latest basil.h header file to be used in this project.

The following Table shows how the System Query attributes (in the XML Response) map into the basil.h structure (basil_system_element_t) that gets populated with this parsed XML information.


XML attribute name

Corresponding Structure element name (in basil.h)

Expected Values

Comments

rolerolebatch, interactiveThis attribute is used for KNL node determination. The structure element "role" will be set to "UNKNOWN" when unexpected attribute values are encountered in the XML response.
statestateup, down, unavailable, routing, suspect, adminThis attribute is used for KNL node determination. The structure element "state" will be set to "UNKNOWN" when unexpected attribute values are encountered in the XML response.
speedspeedValue cannot be an empty string, cannot be negative, cannot be "0". 
numa_nodesnuma_nodesValue cannot be an empty string, cannot be negative, cannot be "0".This attribute is ignored during KNL vnode creation.
diesn_diesValue cannot be an empty string, cannot be negative, can be "0".This attribute is ignored during KNL vnode creation.
compute_unitscompute_unitsValue cannot be an empty string, cannot be negative, can be "0".This attribute will be displayed in 'resources_available.nppus'.
cpus_per_cucpus_per_cuValue cannot be an empty string, cannot be negative, cannot be "0".This will be displayed in 'resources_available.vps_per_ppu' (the product of compute_units & cpus_per_cu will be displayed in 'resources_available.ncpus').
page_size_kbavlmem

Value of attribute page_size_kb cannot be an empty string, cannot be negative, cannot be "0".

 avlmem holds the product of page_size_kb & page_count.

This represents conventional DRAM memory (will be displayed as 'resources_available.mem').
 pgszl2pgszl2 holds X, where 2^X is page_size_kb in Bytes. 
page_countRefer to avlmem note above (under "Values")Value cannot be an empty string, cannot be negative, can be "0". 
accelsaccel_nameNot every Node group in the System 1.7 XML response may have this attribute. When it is present, the attribute value cannot be an empty string.

If this attribute is present in the XML response, we capture the attribute value during XML parsing. However, this attribute is ignored during subsequent KNL vnode creation i.e. KNL vnodes will be created without this attribute. KNL nodes cannot have GPUs.

accel_stateaccel_stateNot every Node group in the System 1.7 XML response may have this attribute. When it is present, the attribute value should be "up" or "down".If this attribute is present in the XML response, we capture the attribute value during XML parsing and set the structure element "accel_state" to "UNKNOWN" when unexpected values are encountered. However, this attribute is ignored during subsequent KNL vnode creation i.e. KNL vnodes will be created without this attribute.
numa_cfgnuma_cfga2a, snc2, snc4, hemi, quad. This attribute will always have a value (non-empty string) for KNL Nodes. The value will be an empty string for non-KNL Nodes. 
hbm_size_mbhbmsizeValue of hbm_size_mb cannot be negative. This attribute will always have a value (non-empty string) for KNL Nodes. This will be an empty string for non-KNL Nodes.This represents High Bandwidth MCDRAM memory (in MB) (will be displayed as 'resources_available.hbmem').
hbm_cache_pcthbm_cfgValue of hbm_cache_pct will be 0, 25, 50, 100. This attribute will always have a value (non-empty string) for KNL Nodes. This will be an empty string for non-KNL Nodes. 
NonenidlistThe Rangelist of Node IDs.The XML response does not have a specific attribute name corresponding to the "nidlist" structure element. During XML parsing, the Rangelist of Node IDs (in the incoming XML) is assigned to the "nidlist" structure element. This is repeated for every Node group in the XML response.


Handling unexpected attribute values.

In some cases (mentioned in the table above), structure elements are set to "UNKNOWN" when unexpected values are encountered.

For all other attributes listed in the table above, we set the 'error class' in the XML Parser's user data structure to "PERMANENT" and return to the XML response handling function, where a message detailing the error condition is printed.

 

Rangelist of Nodes.

The basil_system_element_t structure has an element called "nidlist" which will point to a list of KNL nodes.

This list of nodes is a part of the System Query XML Response. Each <Nodes> Element will contain this data (as a character string).

An example is "12,13-15,22,23". This implies that the XML Attributes that are a part of this <Nodes> Element all apply to the nodes numbered "12", "13", "14", "15", "22", "23".

This grouping of XML data greatly reduces the size of the returned XML data. Currently, with the Inventory (BASIL 1.7) Query, the XML Response contains information separately per Node, leading to a large amount of XML data.


How to determine whether a range list of nodes (in the System Query XML Response) is KNL or not.

All KNL nodes will have non-empty "numa_cfg", "hbm_size_mb" and "hbm_cache_pct" attributes. Non-KNL nodes will have empty ("") values corresponding to these attributes.

We are only creating vnodes for KNL nodes that have the "role" attribute set to "batch" & the "state" attribute set to "up".

We ignore all Node groups (in the System Query XML Response) that do not meet the above criteria, when processing the System (BASIL 1.7) Query XML Response.

 

The following new functions will be added :

    • alps_system_KNL(), new_alps_req_KNL(), system_start(), node_group_start(), parse_nidlist_char_data().

The following existing functions will be modified :

    • response_start(), response_data_start(), allow_char_data(), free_basil_response_data().

The following new functions will be added :

    • alps_engine_query_KNL(), exclude_from_KNL_processing(), system_to_vnodes_KNL(), create_vnodes_KNL(), process_nodelist_KNL(), store_nids(), free_basil_elements_KNL().

The following functions will be modified:

    • alps_system_KNL(), parse_nidlist_char_data(), response_data_start().

Callback functions used during XML Parsing.

system_start() and node_group_start() are the new callback functions registered to handle the 'system' & 'nodes' Elements in the System Query (BASIL 1.7) XML Response.

 

Data structures to be modified.

    • handler[]. It is used to register the XML Parser Element Handlers.
    • ud_t. It is the user data structure for the 'Expat' XML Parser.
      In order to be able to pass information between different function handlers, a data structure (ud_t) is defined to hold the shared variables.
      We tell 'expat' (using the XML_SetUserData() function) to pass a pointer to this structure to the handlers. This is typically the first argument received by most handler functions.
    • Additional information about the ud_t data structure. Please refer to alps_request_parent() for the following.
      • basil_response_t *brp; => a pointer to the 'basil_response_t' structure (refer to basil.h for details).
      • ud_t ud; => this is a structure variable with local scope.
      • ud.brp = brp; => 'brp' is malloc'd space and ud.brp is then set to point to the memory associated with the local pointer 'brp'.
      • XML_SetUserData(parser, (void *)&ud); => sets the 'user data' structure (ud) that gets passed to handler functions (e.g. System & Nodes Element start/end handler functions).
      • As/when Handler functions are invoked, this memory location gets populated with parsed XML information.
      • The XML Element handler functions are passed the 'ud' data structure as a parameter.
      • After all the function handlers have been called, the parsed data will be populated in user structures (mentioned below) appropriately.
      • The brp pointer is then returned from alps_request_parent() to alps_request() & one of the following paths is then taken depending on how we arrived at alps_request_parent().
      • Control goes back to alps_inventory(), which then calls inventory_to_vnodes(brp) or
      • Control goes back to alps_system_KNL(), which then calls system_to_vnodes_KNL(brp_knl).
      • The XML Response data parsed above, populated in user data structures, is then used to create vnodes.
    • The 'ud_t' has been defined (at the top of alps.c) as a structure that contains the following elements (among others) that are defined in basil.h.
      • basil_response_t
      • inventory_data_t
      • It contains basil_node_t, basil_node_socket_t, basil_node_memory_t and other structures that capture information conveyed in the parsed BASIL 1.4 Inventory XML response.
      • system_data_t
      • It contains the basil_system_element_t structure that captures information conveyed in the parsed BASIL 1.7 System XML response..