Target release | Future release |
---|---|
JIRA link | |
Document status | DRAFT |
Document owner | |
Designer | |
Developers | |
QA | |
Forum Discussion | PP-1018: Design document review of Placement set sorting feature |
Change Control: stable
Standing of the interface: new interface
Interface type: Configure variable (in sched_config)
Synopsis: A new sched_config named "node_group_sort_key" option is added which will enable admin to configure the order of placement set used for job placement
Details:
Currently the order of placement set used while job placement is hard coded as below and cannot be customized by admin
The scheduler examines the placement sets in the pool and orders them, from smallest to largest, according to the following rules:
The new interface introduced in this EDD, node_group_sort_key will give the admin a free hand in customizing the order of placement sets considered in placement pool during job placement. This new interface provides multiple sorting domains in a placement pool with multiple placement series.
The syntax of node_group_sort_key is similar to node_sort_key sched_config, i.e a multi-word multi-line key, as described below
node_group_sort_key: “<resource> HIGH | LOW total | assigned | unused” <prime option>
where
resource could be a custom vnode resources (including string / string array) or built-in vnode resource like ncpu or mem
total Use the resources_available value
assigned Use the resources_assigned value
unused Use the value given by resources_available - resources_assigned
HIGH to sort descending order i.e, high first and low last
LOW to sort ascending order i.e, low first and high last
Unlike node_sort_key, here the resource is not restricted to numerical value. The idea is to allow use of string values such as "string array resource names that define placement series" to order placement sets as per alphabetical order of the string values used in defining each placement set.
node_group_sort_key can be defined more than once in multiple lines with different resources in the sched_config. The scheduler will order the placement sets based on multiple node_group_sort_key in the order it appears in sched_config. This behavior is same as in node_sort_key
Default Value:
The default value of node_group_sort_key shall be defined in the sched_config as below. This value will be equivalent to the current hard coded rules mentioned at the beginning above.
node_group_sort_key: "ncpus LOW total" all
node_group_sort_key: "mem LOW total" all
node_group_sort_key: "ncpus LOW unused" all
node_group_sort_key: "mem LOW unused" all
Current behavior: Placement sets are created and partitioned based on the different string values defined at the vnodes under custom string array resources named in node_group_key list, indexed in a single dimention, i.e after flatening the resource names. For example, if the server’s node_group_key attribute contains “router,switch”, and router can take the values “R1” and “R2” and switch can take the values “S1”, “S2”, and “S3”, then there are five placement sets "R1, R2, S1, S2, S3", in two placement series, in the server’s placement pool.
New behavior introduced with this EDD: Placement sets are created and partitioned based on the combination of different string values defined at the vnodes under custom string array resources named in node_group_key list, indexed multi dimensionally with custom string array resources named in the node_group_key as the index appearing in that order. This is in addition to the placement sets created as in current behavior, i.e flatened index. For example, if the server’s node_group_key attribute contains “router,switch”, and router can take the values “R1” and “R2” and switch can take the values “S1”, “S2”, and “S3”, then there are elleven placement sets "R1-S1, R1-S2, R1-S3, R2-S1, R2-S2, R2-S3, R1, R2, S1, S2, S3", in two placement series, in the server’s placement pool.
Example Scenario of placement set sorting:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lets assume we have simple placement set, i.e a single placement series configured as below set server node_group_key = switch Lets assume we have below vnode list as below
Since here we have 8 vnodes in a single placement series, the partitioning and identification of psets will be same with current and new behavior. Here there are 6 placement sets created and identified as below.
Now the order of placement sets considered for a job placement for currernt and new behaviour is as described below ** only static resource is considered here for simplicity | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ordering of Placement sets: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Current Beharviour | New Behaviour | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| node_group_sort_key: "switch HIGH" all
|
2. Simple Placement Pool with two placement series | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lets assume a complex with multiple partitions and each partition consists multiple node groups. This can be defined by simple placement pool, with two placement series configured as below. Now assume admin has a requirement for jobs to be placed to a partition by considering all the partitions in a particular order, and default order for placement sets within partions. ** here the words "router" and "switch" are considered as label and do not necessarily relate to network Router or network Switch hardwares or describe any network topology |
Vnode | resource_available. | |||
---|---|---|---|---|
ncpus | mem | router | switch | |
vn10 | 4 | 8GB | "rt1" | "sw3" |
vn11 | 2 | 8GB | "rt1" | "sw2" |
vn12 | 8 | 8GB | "rt1" | "sw4,sw1" |
vn13 | 2 | 4GB | "rt1" | "sw1" |
vn14 | 8 | 16GB | "rt1" | "sw2" |
vn15 | 4 | 16GB | "rt1" | "sw3,sw1" |
vn16 | 8 | 8GB | "rt1" | "sw4," |
vn17 | 4 | 4GB | "rt1" | "sw3,sw4" |
vn20 | 2 | 4GB | "rt2" | "sw4,sw3,sw2" |
vn21 | 2 | 8GB | "rt2" | "sw3,sw1" |
vn22 | 4 | 8GB | "rt2" | "sw2,sw4,sw1,sw3" |
vn23 | 4 | 4GB | "rt2" | "sw4,sw3,sw1" |
vn24 | 8 | 8GB | "rt2" | "sw4,sw2" |
vn25 | 4 | 4GB | "rt2" | "sw1,sw2,sw3,sw4" |
vn26 | 2 | 2GB | "rt2" | "sw4" |
vn27 | 2 | 4GB | "rt2" | "sw2,sw3" |
vn30 | 2 | 2GB | "rt3" | "sw1,sw2,sw4" |
vn31 | 8 | 8GB | "rt3" | "sw3" |
vn32 | 4 | 4GB | "rt3" | "sw4,sw2,sw3" |
vn33 | 4 | 4GB | "rt3" | "sw4,sw3" |
vn34 | 8 | 16GB | "rt3" | "sw4" |
vn35 | 8 | 8GB | "rt3" | "sw2,sw1,sw4" |
vn36 | 4 | 8GB | "rt3" | "sw3" |
vn37 | 8 | 16GB | "rt3" | "sw1" |
vn40 | 4 | 8GB | "rt4" | "sw2" |
vn41 | 2 | 2GB | "rt4" | "sw2" |
vn42 | 8 | 16GB | "rt4" | "sw4,sw3,sw2" |
vn43 | 4 | 4GB | "rt4" | "sw3,sw4" |
vn44 | 8 | 8GB | "rt4" | "sw1,sw2,sw3,sw4" |
vn45 | 4 | 4GB | "rt4" | "sw4,sw2" |
vn46 | 8 | 8GB | "rt4" | "sw1" |
vn47 | 8 | 8GB | "rt4" | "sw4" |
vn50 | 4 | 8GB | "rt3,rt1" | "sw4,sw1" |
vn51 | 2 | 2GB | "rt3,rt1" | "sw2" |
vn52 | 8 | 16GB | "rt3,rt1" | "sw2" |
vn53 | 4 | 4GB | "rt3,rt1" | "sw1,sw4" |
Here there will be 8 placement sets identified as below
psets | Total | Vnodes in pset | |
---|---|---|---|
ncpus | mem | ||
sw1 | 70 | 106GB | vn12, vn13, vn15, vn21, vn22, vn23, vn25,vn30, vn35, vn37, vn44, |
sw2 | 80 | 122GB | vn11, vn14, vn20, vn22, vn24, vn25, vn27, vn30, vn32, vn35, vn40, vn41, vn42, vn44, vn45, vn51, vn52 |
sw3 | 70 | 112GB | vn10, vn15, vn17, vn20, vn21, vn22, vn23, vn25, vn27, vn31, vn32, vn33, vn36, vn42, vn43, vn44 |
sw4 | 110 | 136GB | vn12, vn16, vn17, vn20, vn22, vn23, vn24, vn25, vn26, vn30, vn32, vn33, vn34, vn35, vn42, vn43, vn44, vn45, vn47, vn50, vn53 |
rt1 | 58 | 102GB | vn10, vn11, vn12, vn13, vn14, vn15, vn16, vn17, vn50, vn51, vn52, vn53 |
rt2 | 28 | 42GB | vn20, vn21, vn22, vn23, vn24, vn25, vn26, vn27 |
rt3 | 64 | 96GB | vn30, vn31, vn32, vn33, vn34, vn35, vn36, vn37, vn50, vn51, vn52, vn53 |
rt4 | 46 | 58GB | vn40, vn41, vn42, vn43, vn44, vn45, vn46, vn47 |
node_group_sort_key: "router HIGH" all
Here there will be 24 placement sets identified as below
psets | Total | Vnodes in pset | |
---|---|---|---|
ncpus | mem | ||
sw1 | 70 | 106GB | vn12, vn13, vn15, vn21, vn22, vn23, vn25, vn30, vn35, vn37, vn44, |
sw2 | 80 | 122GB | vn11, vn14, vn20, vn22, vn24, vn25, vn27, vn30, vn32, vn35, vn40, |
sw3 | 70 | 112GB | vn10, vn15, vn17, vn20, vn21, vn22, vn23, vn25, vn27, vn31, vn32, |
sw4 | 110 | 136GB | vn12, vn16, vn17, vn20, vn22, vn23, vn24, vn25, vn26, vn30, vn32, |
rt1 | 58 | 102GB | vn10, vn11, vn12, vn13, vn14, vn15, vn16, vn17, vn50, vn51, vn52, |
rt2 | 28 | 42GB | vn20, vn21, vn22, vn23, vn24, vn25, vn26, vn27 |
rt3 | 64 | 96GB | vn30, vn31, vn32, vn33, vn34, vn35, vn36, vn37,vn50, vn51, vn52, |
rt4 | 46 | 58GB | vn40, vn41, vn42, vn43, vn44, vn45, vn46, vn47 |
rt1-sw1 | 22 | 40GB | vn12, vn13, vn15, vn50, vn53 |
rt1-sw2 | 20 | 42GB | vn11, vn14, vn51, vn52 |
rt1-sw3 | 12 | 28GB | vn10, vn15, vn17 |
rt1-sw4 | 28 | 32GB | vn12, vn16, vn17, vn50, vn53 |
rt2-sw1 | 14 | 24GB | vn21, vn22, vn23, vn25 |
rt2-sw2 | 20 | 28GB | vn20, vn22, vn24, vn25, vn27 |
rt2-sw3 | 18 | 32GB | vn20, vn21, vn22, vn23, vn25, vn27 |
rt2-sw4 | 24 | 30GB | vn20, vn22, vn23, vn24, vn25, vn26 |
rt3-sw1 | 26 | 38GB | vn30, vn35, vn37, vn50, vn53 |
rt3-sw2 | 24 | 32GB | vn30, vn32, vn35, vn51, vn52 |
rt3-sw3 | 20 | 24GB | vn31, vn32, vn33, vn36 |
rt3-sw4 | 34 | 46GB | vn30, vn32, vn33, vn34, vn35, vn50, vn53 |
rt4-sw1 | 16 | 16GB | vn44, vn46 |
rt4-sw2 | 26 | 38GB | vn40, vn41, vn42, vn44, vn45 |
rt4-sw3 | 20 | 28GB | vn42, vn43, vn44 |
rt4-sw4 | 32 | 40GB | vn42, vn43, vn44, vn45, vn47 |
psets | Total | Vnodes in pset | |
---|---|---|---|
ncpus | mem | ||
rt2 | 28 | 42GB | vn20, vn21, vn22, vn23, vn24, vn25, vn26, vn27 |
rt4 | 46 | 58GB | vn40, vn41, vn42, vn43, vn44, vn45, vn46, vn47 |
rt1 | 58 | 102GB | vn10, vn11, vn12, vn13, vn14, vn15, vn16, vn17, vn50, vn51, vn52, vn53 |
rt3 | 64 | 96GB | vn30, vn31, vn32, vn33, vn34, vn35, vn36, vn37, vn50, vn51, vn52, vn53 |
sw1 | 70 | 106GB | vn12, vn13, vn15, vn21, vn22, vn23, vn25,vn30, vn35, vn37, vn44, |
sw3 | 70 | 112GB | vn10, vn15, vn17, vn20, vn21, vn22, vn23, vn25, vn27, vn31, vn32, vn33, vn36, vn42, vn43, vn44 |
sw2 | 80 | 122GB | vn11, vn14, vn20, vn22, vn24, vn25, vn27, vn30, vn32, vn35, vn40, vn41, vn42, vn44, vn45, vn51, vn52 |
sw4 | 110 | 136GB | vn12, vn16, vn17, vn20, vn22, vn23, vn24, vn25, vn26, vn30, vn32, vn33, vn34, vn35, vn42, vn43, vn44, vn45, vn47, vn50, vn53 |
node_group_sort_key: "router HIGH" all
psets | Total | Vnodes in pset | |
---|---|---|---|
ncpus | mem | ||
rt4-sw1 | 16 | 16GB | vn44, vn46 |
rt4-sw3 | 20 | 28GB | vn42, vn43, vn44 |
rt4-sw2 | 26 | 38GB | vn40, vn41, vn42, vn44, vn45 |
rt4-sw4 | 32 | 40GB | vn42, vn43, vn44, vn45, vn47 |
rt3-sw3 | 20 | 24GB | vn31, vn32, vn33, vn36 |
rt3-sw2 | 24 | 32GB | vn30, vn32, vn35, vn51, vn52 |
rt3-sw1 | 26 | 38GB | vn30, vn35, vn37, vn50, vn53 |
rt3-sw4 | 34 | 46GB | vn30, vn32, vn33, vn34, vn35, vn50, vn53 |
rt2-sw1 | 14 | 24GB | vn21, vn22, vn23, vn25 |
rt2-sw3 | 18 | 32GB | vn20, vn21, vn22, vn23, vn25, vn27 |
rt2-sw2 | 20 | 28GB | vn20, vn22, vn24, vn25, vn27 |
rt2-sw4 | 24 | 30GB | vn20, vn22, vn23, vn24, vn25, vn26 |
rt1-sw3 | 12 | 28GB | vn10, vn15, vn17 |
rt1-sw2 | 20 | 42GB | vn11, vn14, vn51, vn52 |
rt1-sw1 | 22 | 40GB | vn12, vn13, vn15, vn50, vn53 |
rt1-sw4 | 28 | 32GB | vn12, vn16, vn17, vn50, vn53 |
rt4 | 46 | 58GB | vn40, vn41, vn42, vn43, vn44, vn45, vn46, vn47 |
rt3 | 64 | 96GB | vn30, vn31, vn32, vn33, vn34, vn35, vn36, vn37,vn50, vn51, vn52, |
rt2 | 28 | 42GB | vn20, vn21, vn22, vn23, vn24, vn25, vn26, vn27 |
rt1 | 58 | 102GB | vn10, vn11, vn12, vn13, vn14, vn15, vn16, vn17, vn50, vn51, vn52, |
sw1 | 70 | 106GB | vn12, vn13, vn15, vn21, vn22, vn23, vn25, vn30, vn35, vn37, vn44, |
sw3 | 70 | 112GB | vn10, vn15, vn17, vn20, vn21, vn22, vn23, vn25, vn27, vn31, vn32, |
sw2 | 80 | 122GB | vn11, vn14, vn20, vn22, vn24, vn25, vn27, vn30, vn32, vn35, vn40, |
sw4 | 110 | 136GB | vn12, vn16, vn17, vn20, vn22, vn23, vn24, vn25, vn26, vn30, vn32, |
3. Complex Placement Pool with more than two placement series | |
---|---|
Ignore this. We may use it later for page characterization. |