Date: Fri, 29 Mar 2024 07:03:17 +0000 (UTC) Message-ID: <57671146.15.1711695797251@44bab36dd6f6> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_14_1160259088.1711695797251" ------=_Part_14_1160259088.1711695797251 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Developer's forum post: http://community.pbspro.org/t/pp-718-= add-fairshare-usage-values-to-the-job-sort-formula/471/1
Fairshare is a tree. There are fairshare groups and entities (e.g.= , users). A job belongs to one entity. Each fairshare group has= children which can be either another fairshare group or an entity. A= ll entities don't have to be at the same level. For example, root can= have 3 children which are two groups and one entity. The two groups = will have entities within them. Each group or entity has a number of = shares assigned to them. This is a relative percentage between the si= blings of that group. Each entity has a fairshare target (fairshare_p= erc). The relative percentage between siblings times the parent's tar= get is the entity/group's target. The shares are turned into a fairsh= are target which is a percentage number between 0 and 1. If you add u= p all the targets of all of the entities, you will reach 100%.
Usage is accumulated by the entities. Each time an entity accumula= tes usage, its parent group accumulates the same amount (and its parent, an= d so on up the tree). This means the parent's usage is a sum of all o= f its children's usage.
The most deserving entity is determined by walking the tree. At ea= ch level in the tree, the most deserving group is chosen, and we descend to= its children and continue on until we reach an entity. The most dese= rving group/entity is a function of the target and the usage. What do= es this mean? It means that low usage entities in high usage groups w= ill be negatively affected by its sibling entities/groups.
The easiest way to map fairshare to the formula would be to take the act= ual usage and fairshare_perc of each entity and compare them. This do= esn't work because low usage entities of high usage groups will no longer b= e negatively affected by its siblings. We effectively flatten the tre= e.
This is fixed by creating an effective usage number which is based on th= e actual usage and some of the parent's effective usage. Even if an = entity has zero usage, it gets some of its parent's usage. It will st= ill be negatively affected by its siblings.
The effective usage formula:
entity's actual usage + (parent's effective usage - entity's actual usag= e) * entity's relative percentage between siblings
Here are the factors:
entity's actual usage: a percentage number of the complex's usage: actua= l usage number / all usage.
parent's effective usage: the above formula applied to the parent
entity's relative percentage between siblings: entity's shares / sum of = shares of all the children of the parent (i.e., its siblings)
Since the effective usage of the parent is used, this is recursively app= lied up the tree. The entity is negatively affected by its siblings. = The parent is negatively affected by its siblings and so forth up the= tree. The effective usage calculations start at the level below root= 's children. Root's children use their actual usage.
Something to note: summing all of the effective usages of all of the ent= ities will be more than 100%. This doesn't allow for direct compariso= n between the entities.
The effective usage keyword in the formula is 'fairshare_tree_usage'
Here is a formula to provide a direct comparison between entities.  = ;It is not the only one, but it will work well. It results in a numbe= r between 0 and 1. A result of .5 means the entity is on target.
2^-(fairshare_tree_usage / entity's fairshare_perc)
This finally allows for a direct comparison between entities, and theref= ore the jobs that belong to those entities.
This is represented in the job_sort_formula with the shorthand keyword '= fairshare_factor' or by using formula math: pow(2, -(fairshare_tree_usage/f= airshare_perc))
There is extra quoting which is required to use formula math. Here= is how it is done: qmgr -c 'set server job_sort_formula=3D"pow(2, -(fairsh= are_tree_usage/fairshare_perc))"'
Please note that this formula divides by fairshare_perc. If an ent= ity's shares is set to 0, this will cause a division by zero error. P= lease take care when using this formula.
Example:
Share numbers do not need to add up to 100, it just makes the example ea= sier to understand. Entities don't need to all be at the same level o= f the tree. For example, root could own an entity.
Tree:
Bob:
actual usage: 100/1200: .083
parent's usage: .1667
relative percentage in group: Bob's shares 50 / total of group1's shares= 100: .5
effective usage: Bob's usage .083 + (parent's usage .1667 - Bob's usage = .083) * .5: .125
Fairshare formula: 2^-(..125/.2): .648
Suzy:
actual usage: 0/1200: 0
parent's usage: 1000/1200: .833
relative percentage in group: Suzy's shares 60 / total of group2's share= s 100: .6
effective usage: Suzy's usage 0 + (parent's usage: .833 - Suzy's usage: = 0) * .6: .5
Fairshare formula: 2^-(.5/.36): .382
Even though Suzy had a higher fairshare_perc than Bob and less usage tha= n Bob, her fairshare formula value is quite a bit lower than his. Thi= s is due to the huge amount of usage her group mate used.
pbsfs example:
# ./pbs= fs -g scott fairshare entity: scott Resgroup=09=09=09=09: 11 cresgroup=09=09=09=09: 15 Shares=09=09=09=09=09: 40 Percentage=09=09=09=09: 24.000000% fairshare_tree_usage=09: 0.832973 usage=09=09=09=09=09: 1000 (cput) usage/perc=09=09=09=09: 4167 Path from root:=20 TREEROOT : 0 1201 / 1.000 =3D 1201 group2 : 11 1001 / 0.600 =3D 1668 scott : 15 1000 / 0.240 =3D 4167
Credit: The math for the effective usage calculation an= d the fairshare formula example came from the SLURM fairshare documentation.