Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Interface 1: Extend PBS to support a list of scheduler objects

    • Visibility: Public
    • Change Control: Stable
    • Details:
      • PBS supports a list of scheduler objects to be created using qmgr. It is similar to how we create nodes in server.
      • qmgr command can be used to create a scheduler object . It must be invoked by a PBS admin/manager.
      • To create a scheduler object and make it run, the following are the mandatory attributes that needs to be set by the user
        • Name of the scheduler is mandatory to be given while creating a scheduler object. 
          • qmgr -c "c sched multi_sched_1"
            • This will create/set the following attributes for the sched object
              • port - If not defined by the user, It will start from 15050 and try to run the scheduler on the next available port number.
              • host (read-only for now, Has the same value as PBS server host)
              • queues = None (default)
              • sched_priv = $PBS_HOME/multi_sched_1_priv (default)
              • sched_log = $PBS_HOME/multi_sched_1_log (default)
              • scheduling = False (default)
              • scheduler_iteration = 600 (default)
        • Set the priv directory for the scheduler.
          • The directory must be root owned and should have permissions as "750". By default a sched object has 
            it's priv directory set as $PBS_HOME/<sched-name>_priv
          • qmgr -c "s sched multi_sched_1 sched_priv=/var/spool/pbs/sched_priv_1"
        • Set the log directory for the scheduler. 
          • The directory must be root owned and should have permissions as "755". By default a sched object has 
            it's logs directory set as $PBS_HOME/<sched_name>_logs
          • qmgr -c "s sched multi_sched_1 sched_log=/var/spool/pbs/sched_logs"
        • To set scheduling on one of the newly created scheduler object one must make use of scheduler name. 
          • By default a multi-sched object has scheduling set as False.
            If no name is specified then PBS server will enable/disable scheduling on default scheduler.
          • qmgr -c " s sched <scheduler name> scheduling = 1"
      • By default PBS server will configure a default scheduler which will run out of the box.
        • The name of this default scheduler will be "pbs_sched"
        • The sched_priv directory of this default scheduler will be set to the $PBS_HOME/sched_priv
        • Default scheduler will log in $PBS_HOME/sched_logs directory.

  • Interface 2: Changes to PBS scheduler
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • Scheduler now has additional attributes which can be set in order to run it.
        • sched_priv - to point to the directory where scheduler keeps the fairshare usage, resource_group, holidays file and sched_config
        • sched_logs - to point to the directory where scheduler logs.
        • policy - collection of various attributes (as mentioned below) which can be used to configure scheduler.
        • queues - list of all the queues for which this scheduler is going to schedule jobs.
        • host - hostname on which scheduler is running. For default scheduler it is set to pbs server hostname.
        • port - port number on which scheduler is listening.
        • job_accumulation_time - amount of time server will wait after the submission of a job before starting a new cycle.
        • state - This attribute shows the status of the scheduler. It is a parameter that is set only by pbs server.
      • One can set a queue or a list of queues to scheduler object. Once set, given scheduler object will only schedule jobs from the queues specified.
        • qmgr -c "s sched multi_sched_1 queues=hp_queue1,hp_queue2"
      • If no queues are specified with a given scheduler object then that scheduler will not schedule any jobs.
      • By default, All new queues created will be attached to the default scheduler, unless specified otherwise.
      • A queue once attached to a scheduler can not be attached to another scheduler. If tried, then it will throw following error:
        • qmgr -c "s sched multi_sched_1 queues=workq"
          Queue workq is already associated with scheduler <sched_name>.
      • Scheduler can now accept a set of policy that it can work on:
        • Policy can be specified by using - qmgr -c "s sched <sched_name> policy=<policy object>" command.
      • Scheduler object "state" attribute will show one of these 3 values  - DOWN, IDLE, SCHEDULING
        • If a scheduler object is created but scheduler is not running for some reason state will be shown as "DOWN"
        • If a scheduler is up and running but waiting for a cycle to be triggered the state will be shown as "IDLE"
        • If a scheduler is up and running and also running a scheduling cycle then the state will be shown as "SCHEDULING"

  • Interface 3: New policy object
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • Admins will now be allowed to create policy objects and give a name to these policy object.
      • Admins can then assign these policy objects to specific schedulers, they can have one policy object assigned to more than one scheduler.
      • One can delete a policy object only when it is not assigned to any scheduler.
      • Example: 
        qmgr -c "c policy p1"
        qmgr -c "s p p1 by_queue=False, strict_ordering=True"
        qmgr -c "s sched scheduler1 policy=p1"

      • Below is the list of policies that reside in the policy attribute of scheduler.

        Policy nameTypeDefault valueexample
        round_robinBooleanround_robin=Falseqmgr -c "s policy p1 round_robin=True"
        by_queueBooleanby_queue=Trueqmgr -c "s policy p1 by_queue=True"
        strict_orderingBooleanstrict_ordering=Falseqmgr -c "s policy p1 strict_ordering=True"
        help_starving_jobsBooleanhelp_starving_jobs=Trueqmgr -c "s policy p1 help_starving_jobs=True"
        max_starvestringmax_starve="24:00:00"qmgr -c "s policy p1 max_starve=24:00:00"
        node_sort_keyarray_stringnode_sort_key = "sort_priority HIGH"qmgr -c 's policy p1 node_sort_key="sort_priority HIGH, ncpus HIGH"'
        provision_policystringprovision_policy="aggressive_provision"qmgr -c "s policy p1 provision_policy="aggressive_provision"
        exclude_resourcesarray_stringNOT SET BY DEFAULTqmgr -c 's policy p1 exclude_resources="vmem, color"'
        load_balancingBooleanload_balancing=Falseqmgr -c "s policy p1 load_balancing=True"
        fairshareBooleanfairshare=Falseqmgr -c "s policy p1 fairshare=True"
        fairshare_usage_resstringfairshare_usage_res=cputqmgr -c "s policy p1 fairshare_usage_res=cput"
        fairshare_entitystringfairshare_entity=euserqmgr -c "s policy p1 fairshare_entity=euser"
        fairshare_decay_timestringfairshare_decay_time="24:00:00"qmgr -c "s policy p1 fairshare_decay_time=24:00:00"
        fairshare_enforce_no_sharesBooleanfairshare_enforce_no_shares=Trueqmgr -c "s policy p1 fairshare_enforce_no_shared=True"
        preemptionBooleanpreemption=Trueqmgr -c "s policy p1 preemption=True"
        preempt_queue_priointegerpreempt_queue_prio=150qmgr -c "s policy p1 preempt_queue_prio=190"
        preempt_priostringpreempt_prio="express_queue, normal_jobs"qmgr -c 's policy p1 preempt_prio="starving_jobs, normal_jobs, starving_jobs+fairshare"'
        preempt_orderstringpreempt_order="SCR"qmgr -c 's policy p1 preempt_order="SCR 70 SC 30"'
        preempt_sortstringpreempt_sort="min_time_since_start"qmgr -c 's policy p1 preempt_sort="min_time_since_start"'
        peer_queuearray_stringNOT SET BY DEFAULTqmgr -c 's policy p1 peer_queue=" workq workq@svr1"
        server_dyn_resarray_stringNOT SET BY DEFAULTqmgr -c 's policy p1 server_dyn_res="mem !/bin/get_mem"'
        dedicated_queuesarray_stringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 dedicated_queues="queue1,queue2"'
        log_eventintegerlog_event=3328qmgr -c "s policy p1 log_event=255"
        job_sort_formulastringNOT SET BY DEFAULTqmgr -c 's policy p1 job_sort_formula="ncpus*walltime"'
        backfill_depthintegerSet to 1 by defaultqmgr -c 's policy p1 backfill_depth=1'
        job_sort_keyarray_stringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 job_sort_key="ncpus HIGH, mem LOW"'
        prime_spillstringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 prime_spill="01:00:00"'
        prime_exempt_anytime_queuesBooleanprime_exempt_anytime_queues=falseqmgr -c 's policy p1 prime_exempt_anytime_queues=false'
        backfill_primeBooleanbackfill_prime=falseqmgr -c 's policy p1 backfill_prime=false'


      • Following are the configurations that are moved/removed:
        • mom_resources - removed (mom periodic hooks can update custom resources)
        • unknown_shares - moved to resource_group file.
        • smp_cluster_dist - It was already deprecated, removed now
        • sort_queues - It was already deprecated, removed now
        • nonprimetime_prefix - New policy object does not differentiate between prime/non-prime time 
        • primetime_prefix - New policy object does not differentiate between prime/non-prime time 
        • resources - New policy object will now list the resources that needs to be excluded from scheduling. By default all resources will be used for scheduling.
        • dedicated_prefix - New policy object will expose "dedicated_queues" which is a list of queues associated with dedicated time.
        • preemptive_sched - This has been renamed to "preemption".
        • log_filter - log_filter has been renamed to "log_event" to be in sync with the option server object exposes.
      • Admin will now be allowed to add different policy object for prime/non-prime time. 
        • If the values of "policy" scheduler attribute is prefixed with "p:", it will be considered as prime-time policy.
        • If the values of "policy" scheduler attribute is prefixed with "np:", it will be considered as non-prime-time policy.
        • Policy name specified without any prefix will be used as all time policy.
        • More than one policy object can be specified at the same time in policy scheduler attribute.
          • example: qmgr -c "s sched sched1 policy=p:p1,np:p2"
          • a primetime policy/non-primetime policy/all time policy can not be specified more than once while setting scheduler's policy attribute.
      • During dedicated time, if prime and non-prime time policies are defined then scheduler will use "prime" time policy to schedule jobs from dedicated queues, else it will apply all time policy.
      • If one wants to use policies mentioned under old sched config file then they need to keep a copy of the config file in the directory mentioned under "sched_priv" attribute.
      • If both policy and sched_config files are present then sched_config file will be ignored.
      • One can unset all the policies in one shot using "qmgr -c "unset sched <sched_name> policy" and this will make scheduler read the sched_config file in the next iteration.

  • Interface 4: Changes to PBS server.
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • PBS does not allow attributes like scheduling, scheduler_iteration to be set on PBS server object.
      • scheduling and scheduler_iteration now belong to the sched object
      • backfill_depth will also be an attribute of scheduler's policy object. 
        • If scheduler is configured to use sched_config instead of policy object, then it will take value of backfill_depth from scheduler object. If not set on scheduler object then it will take what is set on the server object (We should deprecate backfill_depth on the server object).
        • If scheduler is configured to use policy object instead of sched_config file, then it will take value of backfill_depth from scheduler's policy object.
        • If there is backfill_depth set on per queue level then that value will take precedence over the value set in sched object or server object.
      • These attributes now belong to a scheduler object and needs to be set on scheduler object using a scheduler name
        • qmgr -c "s policy p1 backfill_depth=3"
        • qmgr -c "s sched multi_sched_1 policy = p1"
      • Setting these attributes on server will result into following warning:
        • qmgr -c "s s backfill_depth=3"
        • qmgr: Warning: backfill_depth in server is deprecated. Set backfill_depth in a scheduler policy object.                               
      • If no scheduler name is specified then also it will throw the following error:
        • qmgr -c "s sched policy.backfill_depth=3"
          No scheduler specified, nothing done
      • Attribute job_sort_formula has been moved from server to scheduler policy attribute.

  • Interface 5: Changes to PBS Nodes objects.
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • Each of the node object in PBS will have an additional attribute called "sched" which can be used to associate a node to a particular scheduler.
      • This attribute will by default be set to the default scheduler started by the server (which is pbs_sched)
      • PBS admin/manager can set node's sched attribute to an existing scheduler name which will be scheduling jobs on this node.
      • When a scheduler object is deleted all the queues/nodes that were associated to the deleted scheduler moves back to default scheduler.

  • Interface 6: Changes to Queues.
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • Queue_type attribute on the queue will extend itself and accept 3 more values - "execution_prime, execution_non_prime, dedicated".
      • If queue_type is set to "execution_prime", jobs from this queue will be considered only during primetime by the scheduler.
      • If queue_type is set to "execution_non_prime", jobs from this queue will be considered only during non-primetime by scheduler.
      • if queue_type is set to "dedicated", jobs from this queue will be considered only during dedicated time.
      • If queue_type is set to "execution", jobs from this queue will be considered to run irrespective of prime/non-prime time.


  • Interface 7: How PBS server runs scheduler.
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • Upon startup PBS server will start all schedulers which have their scheduling attribute set to "True"
        • If "PBS_START_SCHED" is set to 0 in pbs.conf then server will not start any scheduler.
      • PBS server will connect to these schedulers on their respective hostnames and port number.
      • If server is unable to connect to these schedulers it will check to see if the scheduler is running, try to connect 5 times, and finally restart the scheduler.
      • Scheduling cycles for all configured schedulers are started by PBS server when a job is queued, finished, when scheduling attribute is set to True or when scheduler_iteration is elapsed.
        • When a job gets queued or finished, server will check it's corresponding queue and try to connect to it's corresponding scheduler to run a scheduling cycle.
        • If a scheduler is already running a scheduling cycle while server will just wait for the previous cycle to finish before trying to start another one.
        • If job_accumulation_time is set then server will wait until that time has passed after the submission of a job before starting a new cycle.
      • Each scheduler while querying server specifies it's scheduler name and then gets only a chunk of the universe which is relevant to this scheduler.
        • It gets all the running, queued, exiting jobs from the queues it is associated with.
        • It gets all the list of nodes which are associated with this scheduler and queues managed by the scheduler.
        • It gets the list of all the global policies like run soft/hard limits set on the server object.


  • Interface 8: What does not work when multiple scheduler objects are present.
    • Visibility: Public
    • Change Control: Experimental
    • Details:
      • When there are multiple scheduler objects configures following things might be broken.
        • Run limits set on server may seem to be broken because a scheduler object may not have a view of whole of the universe.
        • Fairshare is now only limited to what a specific scheduler views, it can not be done complex wide with multiple schedulers.

...