Architecture Design

Synopsis

As part of the BASIL 1.7 project, we will be making a System Query to get KNL Node information. One vnode per KNL node will be created using this information.

In the current system, we make an Inventory (BASIL 1.4) Query to get Inventory information. Vnodes are created for compute nodes based on this information.

We are currently at BASIL 1.4. For BASIL 1.5 and 1.6, changes have not been implemented in PBS. This project aims to support the BASIL 1.7 System Query for KNL nodes only.

At some point in the future, we may migrate from the the existing Inventory (BASIL 1.4) Query and implement the Inventory (BASIL 1.7) Query.

Definitions:

In the current system, for non-KNL nodes returned as part of the Inventory (BASIL 1.4) Query, we create a vnode per Segment( vnode_per_numa_node=True). The PBScrayseg attribute of the created vnode will reflect the segment ordinal e.g.

for ordinal=0, PBScrayseg=0, for ordinal=1, PBScrayseg=1 etc.

In the current system, for KNL nodes returned as part of the Inventory (BASIL 1.4) Query, we create 1 vnode per Node (the segment ordinal for KNL Nodes is set to 0 & PBScrayseg = 0).

The System (BASIL 1.7) Query returns grouped information (attributes) that apply to a range of Nodes.

The numa_nodes attribute will reflect the number of NUMA Nodes/Segments this KNL Node has. Regardless, we will be creating 1 vnode per KNL Node.

Additional attributes such as numa_cfg, hbm_cache_pct and hbm_size_mb will also be considered when creating KNL vnodes.

 

Current behavior

PBS makes an INVENTORY Query request (using BASIL 1.4).

The Query response (from ALPS) is an XML representation of Compute Nodes.

Flow of control

New behavior

PBS will make a SYSTEM Query request (using BASIL 1.7) to collect information on KNL Nodes.

The Query response from ALPS will be an XML representation of KNL Nodes.

This XML Response will be parsed & appropriate structures populated.

PBS then makes an INVENTORY Query request (using BASIL 1.4).

The Query response (from ALPS) is an XML representation of Compute Nodes.

Non-KNL vnodes will be created using this information. KNL nodes will be filtered using the earlier fetched System BASIL 1.7 information on KNL nodes.

Subsequently, KNL vnodes (using the earlier fetched System BASIL 1.7 information) will be created.

Flow of control


Filtering KNL Node IDs during non-KNL vnode creation.

In functions called from alps_system_KNL(), KNL Node IDs are extracted from each Node group in the System BASIL 1.7 XML response and accummulated in a buffer for later use.

KNL Node IDs in this buffer will then be excluded from vnode creation in inventory_to_vnodes() (which creates non-KNL vnodes only, using information from the Inventory 1.4 response which includes KNL Node IDs).

Subsequently, KNL vnodes are created in system_to_vnodes_KNL().


basil.h header file.

The following Table shows how the System Query attributes (in the XML Response) map into the basil.h structure (basil_system_element_t) that gets populated with this parsed XML information.


XML attribute name

Corresponding Structure element name (in basil.h)

Expected Values

Comments

rolerolebatch, interactiveThis attribute is used for KNL node determination. The structure element "role" will be set to "UNKNOWN" when unexpected attribute values are encountered in the XML response.
statestateup, down, unavailable, routing, suspect, adminThis attribute is used for KNL node determination. The structure element "state" will be set to "UNKNOWN" when unexpected attribute values are encountered in the XML response.
speedspeedValue cannot be an empty string, cannot be negative, cannot be "0". 
numa_nodesnuma_nodesValue cannot be an empty string, cannot be negative, cannot be "0".This attribute is ignored during KNL vnode creation.
diesn_diesValue cannot be an empty string, cannot be negative, can be "0".This attribute is ignored during KNL vnode creation.
compute_unitscompute_unitsValue cannot be an empty string, cannot be negative, can be "0".This attribute will be displayed in 'resources_available.nppus'.
cpus_per_cucpus_per_cuValue cannot be an empty string, cannot be negative, cannot be "0".This will be displayed in 'resources_available.vps_per_ppu' (the product of compute_units & cpus_per_cu will be displayed in 'resources_available.ncpus').
page_size_kbavlmem

Value of attribute page_size_kb cannot be an empty string, cannot be negative, cannot be "0".

 avlmem holds the product of page_size_kb & page_count.

This represents conventional DRAM memory (will be displayed as 'resources_available.mem').
 pgszl2pgszl2 holds X, where 2^X is page_size_kb in Bytes. 
page_countRefer to avlmem note above (under "Values")Value cannot be an empty string, cannot be negative, can be "0". 
accelsaccel_nameNot every Node group in the System 1.7 XML response may have this attribute. When it is present, the attribute value cannot be an empty string.

If this attribute is present in the XML response, we capture the attribute value during XML parsing. However, this attribute is ignored during subsequent KNL vnode creation i.e. KNL vnodes will be created without this attribute. KNL nodes cannot have GPUs.

accel_stateaccel_stateNot every Node group in the System 1.7 XML response may have this attribute. When it is present, the attribute value should be "up" or "down".If this attribute is present in the XML response, we capture the attribute value during XML parsing and set the structure element "accel_state" to "UNKNOWN" when unexpected values are encountered. However, this attribute is ignored during subsequent KNL vnode creation i.e. KNL vnodes will be created without this attribute.
numa_cfgnuma_cfga2a, snc2, snc4, hemi, quad. This attribute will always have a value (non-empty string) for KNL Nodes. The value will be an empty string for non-KNL Nodes. 
hbm_size_mbhbmsizeValue of hbm_size_mb cannot be negative. This attribute will always have a value (non-empty string) for KNL Nodes. This will be an empty string for non-KNL Nodes.This represents High Bandwidth MCDRAM memory (in MB) (will be displayed as 'resources_available.hbmem').
hbm_cache_pcthbm_cfgValue of hbm_cache_pct will be 0, 25, 50, 100. This attribute will always have a value (non-empty string) for KNL Nodes. This will be an empty string for non-KNL Nodes. 
NonenidlistThe Rangelist of Node IDs.The XML response does not have a specific attribute name corresponding to the "nidlist" structure element. During XML parsing, the Rangelist of Node IDs (in the incoming XML) is assigned to the "nidlist" structure element. This is repeated for every Node group in the XML response.



Handling unexpected attribute values.

In some cases (mentioned in the table above), structure elements are set to "UNKNOWN" when unexpected values are encountered.

For all other attributes listed in the table above, we set the 'error class' in the XML Parser's user data structure to "PERMANENT" and return to the XML response handling function, where a message detailing the error condition is printed.

 

Rangelist of Nodes.

The basil_system_element_t structure has an element called "nidlist" which will point to a list of KNL nodes.

This list of nodes is a part of the System Query XML Response. Each <Nodes> Element will contain this data (as a character string).

An example is "12,13-15,22,23". This implies that the XML Attributes that are a part of this <Nodes> Element all apply to the nodes numbered "12", "13", "14", "15", "22", "23".

This grouping of XML data greatly reduces the size of the returned XML data. Currently, with the Inventory (BASIL 1.7) Query, the XML Response contains information separately per Node, leading to a large amount of XML data.


How to determine whether a range list of nodes (in the System Query XML Response) is KNL or not.

All KNL nodes will have non-empty "numa_cfg", "hbm_size_mb" and "hbm_cache_pct" attributes. Non-KNL nodes will have empty ("") values corresponding to these attributes.

We are only creating vnodes for KNL nodes that have the "role" attribute set to "batch" & the "state" attribute set to "up".

We ignore all Node groups (in the System Query XML Response) that do not meet the above criteria, when processing the System (BASIL 1.7) Query XML Response.

 

The following new functions will be added :

The following existing functions will be modified :

The following new functions will be added :

The following functions will be modified:

Callback functions used during XML Parsing.

system_start() and node_group_start() are the new callback functions registered to handle the 'system' & 'nodes' Elements in the System Query (BASIL 1.7) XML Response.

 

Data structures to be modified.