Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


...

Interface 1: equiv_class_enable scheduler attribute

...

  • Usage: boolean

...

Interface 2: equiv_class_exclude scheduler attribute

  • Visibility: Public
  • Change Control: Stable
  • Permissions: Write: Manager Read: Everyone
  • Details: Using this attribute admin can exclude certain resources from being considered when building the equivalence classes.  These resources are excluded from both the Resource_list and select resources.
    • Usage: comma separated list of resources
    • Example: equiv_class_exclude: walltime, software
  • Default: Unset
  • Log/Error messages:

After all N (backfill_depth=N) jobs have been added to the calendar, the equivalence classes will start being used.

This means if one job within the equivalence class can not run, the rest of the jobs within the class will not be considered in the cycle.

There is a new qmgr scheduler object attribute named 'equiv_class_enable' which will switch between the old and new behavior of this feature.

Usage: qmgr> s sched equiv_class_enable: True

Once equivalence classes are enabled, the scheduler will create a set of jobs that are equivalent.  An equivalence class is made up of jobs that have the same euser, egroup, project, select, place, and Resource_List resources.  Any undesired resources can be excluded by listing them in the 'equiv_class_exclude' sched attribute.  Any resource listed is excluded from both the Resource_List resources and the select resources.

The external behavior of this feature is seen in the following way in the scheduler logs:

Old:

No Format
...;Job Id;Considering job to run
...;Job Id;<Reason job can not run>
...;Job Id;Considering job to run
...

i.e. each job gets its own "Considering job to run line"

Example:

No Format
04/15/2015 16:01:18;1234.mars;Considering job to run
04/15/2015 16:01:18;1234.mars;Insufficient amount of resource: ncpus
04/15/2015 16:01:18;1235.mars;Considering job to run

New:

No Format
...;Job Id;Considering job to run
...;Job Id;<Reason job can not run>
<same line for rest of equivalence class

Example:

...

Equivalence classes are a way to group similar jobs together.  Once one job in a class can not run, the scheduler knows the rest of the jobs in that class can not run.  This allows the scheduler to be more efficient by not having to consider all the jobs in the system.


Similarity is defined by the values of the following attributes and resources.  If two jobs have equal values of all the attributes and resources in use, then they are in the same equivalence class.  An attribute or resource is used based on the situation described.

euser: If there are any user limits(soft or hard)

egroup: If there are any group limits(soft or hard)

project: If there are any project limits(soft or hard)

queue: If the job is in a queue

  • with limits (hard or soft)
  • with nodes associated to it
  • which is a prime time queue
  • which is a nonprime time queue
  • which is a dedicated time queue

All resources in the sched_config resources line in the select statement

  • Jobs in the suspended state use a special scheduler-generated select statement.  This specially generated select statement is based on the existing select statement and the vnodes the job is running on.  This will likely result in a suspended job running in its own equivalence class.

All resources in the sched_config resources line from Resource_List (qsub -l)

Time based resources: walltime, cput, max_walltime, and min_walltime from Resource_List

If preempt_targets_enable is true, Resource_List.preempt_targets

The place statement


How equivalence classes work:

  1. The scheduler starts considering jobs in sorted order
  2. When a job can't run, we mark the equivalence class as can't run.  We stash the reason the job can't run.
  3. In the future when we consider a job from this class, we already know it can't run.  We use the stashed reason.


There are no public external interface changes for this feature.  The only outward sign of this feature working is a faster scheduling cycle.  There is one PBS private log message added for testing purposes only.


Private Interface #1

Change Control: PBS Private/Contractual(QA)

Visibility: Scheduler log message at DEBUG3

Description: "Number of job equivalence classes: N" where N is the number of job equivalence classes