Skip to end of metadata
Go to start of metadata

Developer's forum post: http://community.pbspro.org/t/pp-718-add-fairshare-usage-values-to-the-job-sort-formula/471/1

Interface 1: job_sort_formula keyword "fairshare_tree_usage"

  • Change Control: Public/Stable
  • Permissions: The job_sort_formula must be set by the admin (e.g., root).
  • Summary: A number between 0 and 1 representing the job's entities usage modified to take it's location in the fairshare tree into account.
  • Details:
    • A number between 0 and 1.  Higher numbers are less deserving.
    • This number is somewhat arbitrary.  It's based on the actual usage of the entity and its location in the fairshare tree.
    • NOTE: the formula is calculated once at the start of the cycle.  This factor will not be updated during the cycle (e.g., after jobs run).

Interface 2: renaming job_sort_formula keyword: fair_share_perc → fairshare_perc

  • Change Control: Public/Stable
  • Summary: The job_sort_formula keyword fair_share_perc is deprecated and replaced with fairshare_perc
  • Details:
    • The renaming is being done to be better aligned with the new fairshare keyword and other fairshare keywords in the sched_config file


How fairshare works:

Fairshare is a tree.  There are fairshare groups and entities (e.g., users).  A job belongs to one entity.  Each fairshare group has children which can be either another fairshare group or an entity.  All entities don't have to be at the same level.  For example, root can have 3 children which are two groups and one entity.  The two groups will have entities within them.  Each group or entity has a number of shares assigned to them.  This is a relative percentage between the siblings of that group.  Each entity has a fairshare target (fairshare_perc).  The relative percentage between siblings times the parent's target is the entity/group's target.  The shares are turned into a fairshare target which is a percentage number between 0 and 1.  If you add up all the targets of all of the entities, you will reach 100%.

Usage is accumulated by the entities.  Each time an entity accumulates usage, its parent group accumulates the same amount (and its parent, and so on up the tree).  This means the parent's usage is a sum of all of its children's usage.

The most deserving entity is determined by walking the tree.  At each level in the tree, the most deserving group is chosen, and we descend to its children and continue on until we reach an entity.  The most deserving group/entity is a function of the target and the usage.  What does this mean?  It means that low usage entities in high usage groups will be negatively affected by its sibling entities/groups.


How do we map fairshare into the formula:

The easiest way to map fairshare to the formula would be to take the actual usage and fairshare_perc of each entity and compare them.  This doesn't work because low usage entities of high usage groups will no longer be negatively affected by its siblings.  We effectively flatten the tree.   

This is fixed by creating an arbitrary usage number which is based on the actual usage and some of the parent's arbitrary usage.   Even if an entity has zero usage, it gets some of its parent's usage.  It will still be negatively affected by its siblings.

The arbitrary usage formula:

entity's actual usage + (parent's arbitrary usage - entity's actual usage) * entity's relative percentage between siblings

Here are the factors:

entity's actual usage: a percentage number of the complex's usage: actual usage number / root's actual usage.

parent's arbitrary usage: the above formula applied to the parent

entity's relative percentage between siblings: entity's shares / sum of shares of all the children of the parent (i.e., its siblings)


Since the arbitrary usage of the parent is used, this is recursively applied up the tree.  The entity is negatively affected by its siblings.  The parent is negatively affected by its siblings and so forth up the tree.  The arbitrary usage calculations start at the level below root's children.  Root's children use their actual usage.

Something to note: summing all of the arbitrary usages of all of the entities will be more than 100%.  This doesn't allow for direct comparison between the entities.


Here is a formula to provide a direct comparison between entities.  It results in a number between 0 and 1.  A result of .5 means the entity is on fairshare_perc.

2^-(fairshare_tree_usage / entity's fairshare_perc)

This is represented in the job_sort_formula as: pow(2, -(fairshare_tree_usage/fairshare_perc))


This finally allows for a direct comparison between entities, and therefore the jobs that belong to those entities.


Example:

Share numbers do not need to add up to 100, it just makes the example easier to understand.  Entities don't need to all be at the same level of the tree.  For example, root could own an entity.

Tree:

  • Root fairshare_perc: 1.0 usage: 1200
    • group1 shares: 40 fairshare_perc: .4 actual usage: 200
      • Bob shares 50 fairshare_perc .2 actual usage: 100
      • Cathy shares 50 fairshare_perc .2 actual usage: 100
    • group2: shares: 60 fairshare_perc: .6 actual usage: 1000
      • Suzy shares 60 fairshare_perc .36 actual usage: 0
      • Scott shares 40 fairshare_perc .24 actual usage: 1000


Bob:

actual usage: 100/1200: .083

parent's usage: .1667

relative percentage in group: Bob's shares 50 / total of group1's shares 100: .5

arbitrary usage: Bob's usage .083 + (parent's usage .1667 - Bob's usage .083) * .5: .0415

Fairshare formula: 2^-(.0415/.2): .866


Suzy:

actual usage: 0/1200: 0

parent's usage: 1000/1200: .833

relative percentage in group: Suzy's shares 60 / total of group2's shares 100: .6

arbitrary usage: Suzy's usage 0 + (parent's usage: .866 - Suzy's usage: 0) * .6: .52

Fairshare formula: 2^-(.52/.36): .367


Even though Suzy had a higher fairshare_perc than Bob and less usage than Bob, her fairshare formula value is quite a bit lower than his.  This is due to the huge amount of usage her group mate used. 


Credit: The math for the arbitrary usage calculation and the fairshare formula example came from the SLURM fairshare documentation.

  • No labels