...
- Visibility: Public
- Change Control: Stable
- Details:
- Scheduler now has additional attributes which can be set in order to run it.
- sched_priv - to point to the directory where scheduler keeps the fairshare usage, resource_group, holidays file and sched_config
- sched_log - to point to the directory where scheduler logs.
- partition - list name of all the partition /s for which this scheduler is going to schedule jobs.
- sched_host - hostname on which scheduler is running. For default scheduler it is set to pbs server hostname.
- sched_port - port number on which scheduler is listening.
- state - This attribute shows the status of the scheduler. It is a parameter that is set only by pbs server.
- One can set a partition or a comma separated list of partitions to assign only one partition per scheduler object. Once set, given scheduler object will only schedule jobs from the queues attached to specified partition"
- qmgr -c "s sched multi_sched_1 partition='part1,part2'"
- If no partition are is specified for a given scheduler object, other than the default scheduler where no partition value can be set, then that scheduler will not schedule any jobs.
- By default, All new queues created will be scheduled by the default scheduler, until they have been assigned to a specific partition.
- A partition once attached to a scheduler can not be attached to a second scheduler without removing it from the first scheduler. If tried, then it will throw following error:
- qmgr -c "s sched multi_sched_1 partition+='part2'"
Partition part2 is already associated with scheduler <scheduler name>.
- qmgr -c "s sched multi_sched_1 partition+='part2'"
- Scheduler object "state" attribute will show one of these 3 values - down, idle, scheduling
- If a scheduler object is created but scheduler is not running for some reason state will be shown as "down"
- If a scheduler is up and running but waiting for a cycle to be triggered the state will be shown as "idle"
- If a scheduler is up and running and also running a scheduling cycle then the state will be shown as "scheduling "
- Default scheduler's state is by default "idle" since the "scheduling" of server is set to true with default installation.
- Scheduler now has additional attributes which can be set in order to run it.
-
- The default sched object is the only sched object that cannot be deleted.
- Trying to set sched_port, sched_priv and sched_host on default scheduler will not be allowed. The following error message is thrown in server_logs when we try to change sched_priv directory.
- qmgr -c "s sched default sched_priv = /tmp
Operation is not permitted on default scheduler
- qmgr -c "s sched default sched_priv = /tmp
- Trying to start a new scheduler other than the default scheduler, without assigning a partition will throw the following error message in scheduler logs.
Scheduler does not contain a partition If Scheduler fails to accept new value for its sched_log directory then comment of the corresponding scheduler object at server is updated with the following message. Also, the scheduling attribute is set to false.
Unable to change the sched_log directoryIf Scheduler fails to accept new value for its sched_priv directory then comment of the corresponding scheduler object at server is updated with the following message. Also, the scheduling attribute is set to false.
Unable to change the sched_priv directory- If PBS validation checks for new value of sched_priv directory do not pass then comment of the corresponding scheduler object at server is updated with the following message. Also, the scheduling attribute is set to false.
PBS failed validation checks for sched_priv directory - If Scheduler is successful in accepting the new log_dir configured at qmgr then the following error message is thrown in the scheduler logs.
Scheduler log directory is changed to <value of path of the log directory>
If Scheduler is successful in accepting the new sched_priv configured at qmgr then the following error message is thrown in the scheduler logs.
Scheduler priv directory is changed to <value of path of the sched_priv directory>If we keep on disassociating partitions an admin unset partition from a scheduler until it does not contain any of the partitions then this scheduler is identical to default scheduler in which case we shutdown this scheduler PBS scheduler will shutdown itself and following error message is thrown logged in scheduler logs.
Scheduler does not contain a partition.If Scheduler fails in getting its stats from Server then the following error message is shown in scheduler logs.
Unable to retrieve the scheduler attributes from serverA new option -I is introduced to provide a name to a scheduler. If we run pbs_sched without this option then it is considered as default scheduler whose name is "default".
Example: pbs_sched -I sc1 -S 15051
Here scheduler is started on port number 15051 whose id/name is "sc1".
...
- Visibility: Public
- Change Control: Stable
- Details:
- Upon startup PBS server will start all schedulers which have their scheduling attribute set to "True"
- "PBS_START_SCHED" pbs.conf variable is now deprecated and it's value will get overridden by schedulers "scheduling" attribute.
- PBS server will connect to these schedulers on their respective host names and port number.
- Scheduling cycles for all configured schedulers are started by PBS server when a job is queued, finished, when scheduling attribute is set to True or when scheduler_iteration is elapsed.
- When a job gets queued or finished, server will check it's corresponding queue and try to connect to it's corresponding scheduler to run a scheduling cycle.
- If a scheduler is already running a scheduling cycle while server will just wait for the previous cycle to finish before trying to start another one.
- If job_accumulation_time is set then server will wait until that time has passed after the submission of a job before starting a new cycle.
- Each scheduler while querying server specifies it's scheduler name and then gets only a chunk of the universe which is relevant to this scheduler.It gets queries whole universe of all schedulers, server, queues, nodes information(This is to avoid IFL changes) etc. from server. Thereafter it does the following.
- It filters all the running, queued, exiting jobs from the queues it is associated with one of it's partitionsits partition/s.
- It gets filters all the list of nodes which are associated with the partition/s managed by the scheduler.
- It gets filters the list of all the global policies like run soft/hard limits set on the server object.
- PBS's init script will now be reporting status of pbs server only. Schedulers will be managed by server and their status can be fetched using a qmgr command.
- When pbs_server daemon is stopped using "qterm -s" then, it will also stop all the running scheduler processes.
- pbs init script while shutting down pbs_server will use the "-s" option to qterm so that all schedulers also come down along with server.
- Upon startup PBS server will start all schedulers which have their scheduling attribute set to "True"
Interface 8: Changes to pbs_rsub commandReservations
- Visibility: Public
- Change Control: Stable
- Details:
- Reservations can now be submitted to a specific partition using a new "-p" option with pbs_rsub command.
- "-p" option in pbs_rsub command takes partition name as input and makes pbs_server to trigger a scheduling cycle of the scheduler that is servicing the partition. If a scheduler servicing the requested partition isn't up and running then pbs server will store the reservation with itself and mark it as "UNCONFIRMED" until it is able to trigger a scheduling cycle of the said scheduler.
Interface 9: Deleted
Interface 10: Fairshare
- Visibility: Public
- Change Control: Stable
- In a Multi-sched environment, reservations can be confirmed by any scheduler servicing their respective partitions.
- After the reservations are confirmed they are assigned the partition their node solution came from.
- Once the reservation is confirmed, it has a partition attribute set on it to identify where it was confirmed. Similarly, the reservation queue also gets a partition attribute set on it (matching the reservation).
- Example:
% pbs_rsub -lselect=1:ncpus=2 -R 1030 -D1200 -I 5
R865.centos CONFIRMED% pbs_rstat -f R865
Resv ID: R865.centos
Reserve_Name = NULL
Reserve_Owner = root@centos
reserve_type = 2
reserve_state = RESV_CONFIRMED
reserve_substate = 2
reserve_start = Mon Feb 03 10:30:00 2020
reserve_end = Mon Feb 03 10:50:00 2020
reserve_duration = 1200
queue = R865
Resource_List.ncpus = 2
Resource_List.nodect = 1
Resource_List.select = 1:ncpus=2
Resource_List.place = free
Resource_List.walltime = 00:20:00
schedselect = 1:ncpus=2
resv_nodes = (vnode2:ncpus=2)
Authorized_Users = root@centos
server = centos
ctime = Mon Feb 03 10:08:55 2020
mtime = Mon Feb 03 10:09:03 2020
interactive = 5
Variable_List = PBS_O_LOGNAME=root,PBS_O_HOST=centos,PBS_O_MAIL=/var/spool/mail/arung,PBS_TZID=America/Los_Angeles
euser = root
egroup = root
partition = P1% qmgr -c "l q R865"
Queue R865
queue_type = Execution
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
acl_user_enable = True
acl_users = root@centos
resources_max.ncpus = 2
resources_available.ncpus = 2
enabled = True
started = False
partition = P1
- Example:
Once a reservation is confirmed and partition is assigned to it, it can not be re-confirmed or altered in any other partition.
Reservations (and their queues) confirmed by the default scheduler are marked under partition "pbs-default".
PBS assigns a default partition name "pbs-default" to all the reservations (and their queues) confirmed by the default scheduler. If an admin tries to assign a scheduler/queue/node partition name "pbs-default", qmgr command throws error - "Default partition name is not allowed".
Interface 9: Deleted
Interface 10: Fairshare
- Visibility: Public
- Change Control: Stable
- Unless there is only a single scheduler, the fairshare scheduling policy per whole PBS complex is no longer supported.
- This policy is limited to each individual scheduler.
- The pbsfs command will now act on a single scheduler's fairshare usage database.
- The new '-I' option allows the admin to specify which scheduler
- If no '-I' option is given, pbsfs will act upon the default scheduler
- pbsfs will now contact the server to query the location of the sched_priv for the scheduler.
- Since contacting the server is now required, the server needs to be running to use pbsfs. This was not true before.
- If the scheduler's sched_priv is not accessible, the existing error message will be printed to stderr
Unable to access fairshare data
- If no such scheduler exists, the following message will be printed to stderr
- Scheduler <sched> does not exist
- If a scheduler does not have its sched_priv set, the following message will be printed to stderr
- Scheduler <sched> does not have its sched_priv set
- Example:
- pbsfs -s user1 10
- sets user1's usage to 10 for the default scheduler
- pbsfs -I sched2 -s user2 10
- sets user2's usage to 10 for sched2.
- pbsfs -s user1 10
- The new '-I' option allows the admin to specify which scheduler
Notes:
...
2. Server's backfill_depth will be default value for all the schedulers in the complex..
Ex: Default server's backfill_depth is 1 ,1 job per each scheduler will be backfilled
Ex: Default If server's backfill_depth is 1 ,1 job per each scheduler will be backfilled
If server's backfill_depth is set to 5 , 5 jobs from each scheduler will get backfilled
...
set to 5 , 5 jobs from each scheduler will get backfilled
3. The pbs_statsched() IFL will return the status of all PBS scheduler status. return type is pointer to list of batch_status structure (one for each scheduler)
4. PTL framework changes :
Multiple scheduler information can be accessed by self.server.schedulers ,
All scheduler functions can be called from a specific scheduler as self.server.schedulers['sched_name'].<method name>.
There is a short hand created to server.schedulers as scheds, We can use this as self.scheds['sched_name'].<method name>.