Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Change Control: Public/Stable
  • Permissions: The job_sort_formula must be set by the admin (e.g., root).
  • Summary: A number between 0 and 1 representing the job's entities usage modified to take it's location in the fairshare tree into account.
  • Details:
    • A number between 0 and 1.  Higher numbers are less deserving.
    • This number is somewhat arbitrary.  It's based on the actual usage of the entity and its location in the fairshare tree, but isn't directly comparable with any other entity's fairshare_tree_usage.
    • The below formula refers to this keyword as effective usage.
    • NOTE: the formula is calculated once at the start of the cycle.  This factor will not be updated during the cycle (e.g., after jobs run).

Interface 2: job_sort_formula keyword "fairshare_factor"

  • Change Control: Public/Stable
  • Permissions: The job_sort_formula must be set by the admin (e.g., root)
  • Summary: Number that allows two entities to be directly compared
  • Details:
    • A number between 0 and 1.  Lower numbers are less deserving
    • This number allows two entities anywhere in the tree hierarchy to be compared.
    • Low usage entities from high usage groups are negatively affected by their siblings.
    • If any job's entity has 0 shares, this keyword will resolve to 0.
    • See below for calculations 

Interface 3: renaming job_sort_formula keyword: fair_share_perc → fairshare_perc

  • Change Control: Public/Stable
  • Summary: The job_sort_formula keyword fair_share_perc is deprecated and replaced with fairshare_perc
  • Details:
    • The renaming is being done to be better aligned with the new fairshare keyword and other fairshare keywords in the sched_config file

Interface 4: Changes to pbsfs output

  • Change Control: Public/Stable
  • Summary: print effective usage in pbsfs output
  • Details:
    • When using pbsfs -g, the effective usage is printed

How fairshare works:

Fairshare is a tree.  There are fairshare groups and entities (e.g., users).  A job belongs to one entity.  Each fairshare group has children which can be either another fairshare group or an entity.  All entities don't have to be at the same level.  For example, root can have 3 children which are two groups and one entity.  The two groups will have entities within them.  Each group or entity has a number of shares assigned to them.  This is a relative percentage between the siblings of that group.  Each entity has a fairshare target (fairshare_perc).  The relative percentage between siblings times the parent's target is the entity/group's target.  The shares are turned into a fairshare target which is a percentage number between 0 and 1.  If you add up all the targets of all of the entities, you will reach 100%.

...

This is fixed by creating an arbitrary effective usage number which is based on the actual usage and some of the parent's arbitrary effective usage.   Even if an entity has zero usage, it gets some of its parent's usage.  It will still be negatively affected by its siblings.

The arbitrary effective usage formula:

entity's actual usage + (parent's arbitrary effective usage - entity's actual usage) * entity's relative percentage between siblings

...

entity's actual usage: a percentage number of the complex's usage: actual usage number / root's actual all usage.

parent's arbitrary effective usage: the above formula applied to the parent

entity's relative percentage between siblings: entity's shares / sum of shares of all the children of the parent (i.e., its siblings)


Since the arbitrary effective usage of the parent is used, this is recursively applied up the tree.  The entity is negatively affected by its siblings.  The parent is negatively affected by its siblings and so forth up the tree.  The arbitrary effective usage calculations start at the level below root's children.  Root's children use their actual usage.

Something to note: summing all of the arbitrary effective usages of all of the entities will be more than 100%.  This doesn't allow for direct comparison between the entities.


The effective usage keyword in the formula is 'fairshare_tree_usage'


Here is a formula to provide a direct comparison between entities.  It is not the only one, but it will work well.  It results in a number between 0 and 1.  A result of .5 means the entity is on fairshare_perctarget.

2^-(fairshare_tree_usage / entity's fairshare_perc)

This finally allows for a direct comparison between entities, and therefore the jobs that belong to those entities.


This is represented in the job_sort_formula aswith the shorthand keyword 'fairshare_factor' or by using formula math: pow(2, -(fairshare_tree_usage/fairshare_perc))This finally allows for a direct comparison between entities, and therefore the jobs that belong to those entities

There is extra quoting which is required to use formula math.  Here is how it is done: qmgr -c 'set server job_sort_formula="pow(2, -(fairshare_tree_usage/fairshare_perc))"'


Please note that this formula divides by fairshare_perc.  If an entity's shares is set to 0, this will cause a division by zero error.  Please take care when using this formula.


Example:

Share numbers do not need to add up to 100, it just makes the example easier to understand.  Entities don't need to all be at the same level of the tree.  For example, root could own an entity.

...

relative percentage in group: Bob's shares 50 / total of group1's shares 100: .5

arbitrary effective usage: Bob's usage .083 + (parent's usage .1667 - Bob's usage .083) * .5: .0415125

Fairshare formula: 2^-(.0415.125/.2): .866648


Suzy:

actual usage: 0/1200: 0

...

relative percentage in group: Suzy's shares 60 / total of group2's shares 100: .6

arbitrary effective usage: Suzy's usage 0 + (parent's usage: .866 833 - Suzy's usage: 0) * .6: .525

Fairshare formula: 2^-(.525/.36): .367382


Even though Suzy had a higher fairshare_perc than Bob and less usage than Bob, her fairshare formula value is quite a bit lower than his.  This is due to the huge amount of usage her group mate used. 


pbsfs example:

Code Block
titlepbsfs -g example
# ./pbsfs -g scott
fairshare entity: scott
Resgroup				: 11
cresgroup				: 15
Shares					: 40
Percentage				: 24.000000%
fairshare_tree_usage	: 0.832973
usage					: 1000 (cput)
usage/perc				: 4167
Path from root: 
TREEROOT  :     0       1201 / 1.000 = 1201
group2    :    11       1001 / 0.600 = 1668
scott     :    15       1000 / 0.240 = 4167



Credit: The math for the arbitrary effective usage calculation and the fairshare formula example came from the SLURM fairshare documentation.