Skip to end of metadata
Go to start of metadata

This is a design proposal for PBS to support job submissions with conditional operators to filter out node where the job can run.

Here is the list of Use cases and motivation behind it.

Motivation:  Resilience – ensure jobs run “correctly” and are unlikely to experience faults due to use of nodes with incompatible properties (with respect to the applications)

Use Cases: 

1.      User requests all allocated nodes will have CPU speed > 2 GHz

2.      User requests none of the allocated nodes will be node X, node Y, node Z, …

3.      User requests none of the allocated nodes will be ARM nor POWER architecture

4.      User requests all of the allocated nodes should be running Linux version 6.5 or higher, but none will be running 6.5.2

Gist of design: We can expose a way for each job to pass a python expression that can be used to filter out nodes on the basis of their resources_available and resources_assigned present on the nodes. This filter will be used to filter nodes every time a job (submitted with node-filter) is considered to run or the time scheduler tries to add this job to the calendar.

link to  forum discussion

Interface 1: Extend PBS to allow users to submit jobs with a node-filter (nfilter) resource expression.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • User can specify a node filter with each of their jobs and this filter will help scheduler to filter out nodes that this job is allowed to run on.
    • There is a new built-in resource “nfilter”. This resource is of type string. Users/operator/manager has privileges to read/write this resource. It is not a host level resource.
    • nfilter is evaluated as a python expression by PBS scheduler to filter out nodes that can be used to run the job in hand. This expression is evaluated every time scheduler considers a job to run and/or while trying to calendar the job.
    • Users can specify a node filter with node resources using conditional operator like "<, >, <=, >=, !=.
      • Example: qsub -lselect=3:ncpus=2:mem=18gb,nfilter=“node['resources_available'][‘ncpus’]>=4 and node['resouces_available'][‘color’] != ‘green’” job.scr
    • While evaluating, nfilter will be made available a node dictionary which itself consists of two dictionaries - resources_available, resources_assigned. It will look something like this - 

      node={'resources_available':{'ncpus':'8','mem':'16777215kb',...},'resources_assigned':{'ncpus':'2', 'mem':'4194304kb',...}}
    • To access a specific resource out of resources_available, resources_assigned inputs, users must enclose each key name within square brackets “[ ]” like this - “node['resources_available'][‘ncpus’]
    • If a job with "nfilter" fails too find a node that it could run on (even if there is an exception while evaluating the expression) , it will be marked as "can not run" in the current scheduling cycle.

Interface 2: Errors logged in scheduler log file while evaluating "nfilter" resource expression

  • Visibility: Public

  • Change Control: Stable

  • Details:

    • If scheduler fails to evaluate the nfilter expression present in the job's resource list it will log following log messages at DEBUG2 log level:

      • When a wrong key name is given that does not exist in the dictionary following error will be logged
        "KeyError encountered: <wrong key>"
      • When a wrong dictionary name (like instead of node, user used mom) is used following error will be logged:
        "NameError: global name <wrong name> is not defined"

  • No labels