PP-482: Soft Walltime

Forum Post Link

Overview:

In order to employ backfilling, the scheduler requires jobs to be submitted with a walltime resource.  Walltime estimates are almost always longer than the amount of time requires for the job to run.   Users will do this overestimating in order to add extra padding so PBS will not kill their job if the job runs too long.  Other users will refuse to submit jobs with a walltime resource due the destructive nature of it.   Due to overestimated walltimes, the scheduler's calendar is always a poor reflection of reality.  

This feature introduces a new resource called soft_walltime.  The scheduler will use the soft_walltime resource in place of the walltime resource when calculating the duration of a job.

     
  • Interface 1: soft_walltime resource

    • Change Control: Stable
    • Permissions: Write: Manager Read: Everyone
      • Writable only by managers to avoid users from exploiting it.
    • Python Type: pbs.duration
    • Details:
      • Using this resource the admin can set an estimate for how long the job will run.  This will be used by the scheduler for the job's duration.  The job will not be killed if it exceeds the estimate.
      • Since Resource_List.soft_walltime can only be set by a manager, it will likely be set via a queuejob hook or resources_default.
      • If soft_walltime is requested to be altered by user/operator, the request will be rejected with the following error message: "Cannot set attribute, read only or insufficient permission" or PBSE_ATTRRO (15003) from the API.
      • The soft_walltime resource can not be requested at submit time (except by queue job hook).  All job submissions use user permissions.  The submission request will be rejected with the same message as above.
      • if the soft_walltime requested is greater than the hard walltime, the request will be rejected with the following error message: "Illegal attribute or resource value" or PBSE_BADATVAL (15014) from the API.
      • The soft_walltime resource is not sent to the mom when the job is started.  It can't be set in mom hooks.
  • Interface 2: New PBS error message 
    • Change Control: Stable
    • Details
      • An attempt to combine STF jobs with soft_walltime will be rejected with the error message: "soft_walltime is not supported with Shrink to Fit jobs" or  PBSE_SOFTWT_STF (15180) from the API
  • Interface 3: estimated.soft_walltime

    • Change Control: Stable
    • Permissions: Write: Read Only Read: Everyone
    • Python type: pbs.duration
    • Details:
      • The current soft_walltime estimate will be available in estimated.soft_walltime
      • This attribute is not available for subjobs.
      • This is for automated testing purposes



An enumeration of places where soft_walltime and hard walltime are used:

When a job is queued

  • If the job is a top job, the soft_walltime will be used to determine when the job will fit onto the calendar.
  • If the job is a filler job, the soft_walltime will be used to see if the job will conflict with top jobs.
  • If the job is a filler job, the hard walltime will be used to see if the job will conflict with confirmed reservations.
  • If dedicated time is used, soft_walltime will be used to see if the job will finish before dedicated time starts.
  • If backfill_prime is set, soft_walltime is used to see if the job will finish before the next prime boundary + prime_spill.

When the job is running:

  • If resources_used.walltime <= soft_walltime, everything acts normally – nothing is wrong.
  • If resources_used.walltime > soft_walltime, the job has exceeded its soft_walltime.  The job is not killed.  The job's soft_walltime is extended
    • Every time the job exceeds its soft_walltime, it will be extended by 100% of its original soft_walltime.
    • If both a soft_walltime and a hard walltime are set, the soft_walltime will never be extended past the job's hard walltime.
    • If a job exceeds its soft_walltime and crosses over into dedicated_time, it is up to the admin to take action (if any).
    • The value of Resource_List.soft_walltime will not change.  The scheduler will set the estimated.soft_walltime value to the new soft_walltime estimate
  • If a job is a preemption candidate, and preempt_order is based off of the percentage the job has completed (e.g., preempt_order SCR 20 S), the initial soft_walltime request will be used to determine the percentage of completion.
    • If the job runs past its initial soft_walltime request, preempt_order will behave as the job is 100% complete.  It will remain at 100% complete for the remainder of the job regardless of how many times the soft_walltime is extended.
    • For example, if a job has soft_walltime=1:00:00, at 59m, the job will be at 99% complete.  At 1:00:00, the soft_walltime is extended to 2:00:00.  At 1:30:00 the job remains at 100% complete since it has reached its original soft_walltime request.

When confirming reservations:

  • PBS's existing behavior has not changed.
  • Only a job's hard walltime is used to determine when jobs end
  • A job's soft_walltime is not used.


Examples:

Job J has a soft_walltime=1:00:00 

  • When J exceeds its soft_walltime, J will be extended by its original soft_walltime to 2:00:00.
    • If J exceeds its soft_walltime again, J will be extended again by its soft_walltime to 3:00:00

Job K has a soft_walltime=1:00:00 and a hard walltime=1:30:00

  • When K exceeds its soft_walltime, it would have been extended to 2:00:00.  Since 2:00:00 is past its hard walltime, K is extended to 1:30:00 instead.


An STF job requesting soft_walltime will be rejected.  It is rejected because a job's min_walltime is the minimum amount of time a job needs to get any real work done.  A job's hard walltime can be set to its min_walltime.  A job's soft_walltime has to be shorter than its hard walltime.  This means that the soft_walltime would have to be shorter than the job's minimum amount of time to get any real work done.  The two features do not make sense together.


A job submitted with a cput resource is still limited by it.  No matter how many times a job exceeds its soft_walltime, it will be killed if it reaches its cput limit.


Setting soft_walltime from a hook:

Examples Setting soft_walltime:

Queuejob Hook
import pbs

e = pbs.event()
j = e.job

j.Resource_List["soft_walltime"] = pbs.duration("300")

pbs.logmsg(pbs.EVENT_DEBUG, "Setting soft_walltime: %s" % j.Resource_List["soft_walltime"])


From qalter (as a manager):

qalter
% qalter -lsoft_walltime=1:00:00 <job id>


If a site wants to allow users to be able to set soft_walltime directly, they can achieve it through a simple queuejob hook.

First the admin creates a custom resource of type long with no flags.  For example, 'qmgr -c 'create resource set_soft_walltime type=long'

Second, the admin creates a simple queuejob hook that will copy the value of the new resource into soft_walltime.


Example of hook to allow users to directly set soft_walltime
import pbs
e = pbs.event()
j = e.job

j.Resource_List["soft_walltime"] = pbs.duration(j.Resource_List["set_soft_walltime"])

Users then request the new resource in their job requests:

% qsub -l set_soft_walltime=1:00:00 -l select=1:ncpus=1