Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a design proposal for PBS to support job submissions with multiple resource requests (with conditional operators) and capability to run only one of them.

Gist of proposed changes:

There are basically two requirements that we are trying to fulfill here - 

One requirement is for the user/admin to be able to specify their node filter criteria. This requirement can be met if we have support to specify some kind or node filter with the job. This filter could be a python expression that can consist of conditional operators with non-consumable resources present on the nodes for now. The filter concept can also be extended to be used as queue/server limits or while trying to find a preemption candidate out of running jobs etc.

...

 

For conditional requests (e.g., allocate resources A or resources B for a job): 

Motivation for #1 & #2:  start the job sooner, trading lower performance/efficiency/cost/utilization… for a faster start time

    • Often the goal is to craft a request that makes the job start “now”
    • Visible progress provides a more positive user experience, and starting to run is evidence of progress
    • The underlying motivation is most likely “earliest finish time”, but multiple confounding factors lead users to desire “earliest start time”.  For example, once a job is started, it is unlikely to be delayed by higher priority work entering the system, so there is more assurance that the end time is fixed.

Use Cases:

1.      User requests job allocating 64 cores; if the job will be started sooner if it requested 32 cores, then run it on 32 cores instead; ditto for 16 cores

a.      Multiple distinct resource request options are provided, and only one is chosen and allocated for the job

      1. The use case has only a single, node-level resource

b.      The resource request options are prioritized first by site-policy, then by the order provided by the user.  If any resource request option can be started (based on site-policy and available resources), the highest priority option is started.

2.      User requests or admin forces job to allocate “like nodes”; “like nodes” all have the same value for some property, resource, or attribute, such as (a) all nodes have the same CPU type (e.g., Intel Sandybridge) or (b) all nodes are attached to the same network fabric (e.g., QDR Infiniband).  (Note: so far, this is exactly the behavior of “place=group=X” in PBS Pro.)   Further, if the job will be started sooner if the job requested like nodes based on a different “like value”, then run it on nodes with that “like value” (e.g., use Intel IvyBridge versus Intel SandyBridge or use FDR Infiniband versus QDR Infiniband).  Ditto for a third choice of “like value”.

a.      Multiple distinct resource request options are provided, and only one is chosen and allocated for the job

      1. The use case has only a single, non-consumable, node-level resource

b.      The resource request options are prioritized first by site-policy, then by the order provided by the user.  If any resource request option can be started (based on site-policy and available resources), the highest priority option is started.

c.       The use case only has two resource request options, but it makes sense to assume there may be more than two, but the usual number is less than 10.

 

Motivation for #3:  Some combination of better node utilization and better application performance

3.      User requests or admin forces jobs requesting N cores to be allocated exclusively onto the smallest quantity of “like nodes” (with respect to resources_available.ncpus), where each node is fully allocated (with respect to ncpus).  E.g., job requests select=64:ncpus=1 and system has both 16-core nodes and 32-core nodes – either allocate 4 16-core nodes (each with 16 chunks) or allocate 2 32-core nodes (each with 32 chunks)

a.      Resource request is for a total number of cores (ncpus), in PBS Pro a request for N cores corresponds to select=N:ncpus=1

b.      Unknown whether there is a prioritization/preference among different “like values”

 

For filtering nodes (e.g., using ==, !=, <, >):

Motivation:  Resilience – ensure jobs run “correctly” and are unlikely to experience faults due to use of nodes with incompatible properties (with respect to the applications)

Use Cases: 

1.      User requests all allocated nodes will have CPU speed > 2 GHz

2.      User requests none of the allocated nodes will be node X, node Y, node Z, …

3.      User requests none of the allocated nodes will be ARM nor POWER architecture

4.      User requests all of the allocated nodes should be running Linux version 6.5 or higher, but none will be running 6.5.2

 

link to forum discussion

Interface 1: New Job attribute called “job_set” - qsub option “-W job_set”

...