Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

Forum Post Link

     
  • Interface 1: soft_walltime resource

    • Change Control: Stable
    • Permissions: Write: Manager Read: Everyone
    • Details:
      • Using this resource the admin can set an estimate for how long the job will run.  This will be used by the scheduler for the job's duration.  The job will not be killed if it exceeds the estimate.
      • Since Resource_List.soft_walltime can only be set by a manager, it will likely be set via a queuejob hook or resources_default.
      • If soft_walltime is requested, it will be rejected with the following error message: qsub: "Cannot set attribute, read only or insufficient permission  Resource_List.soft_walltime" or PBSE_ATTRRO (15003) from the API.
  • Interface 2: New PBS error message 
    • Change Control: Stable
    • Details
      • An attempt to combine STF jobs with soft_walltime will be rejected with the error message: "soft_walltime is not supported with Shrink to Fit jobs" or  PBSE_SOFTWT_STF (15178 - likely to change during development) from the API


In order to employ backfilling, the scheduler requires jobs to be submitted with a walltime resource.  Walltime estimates are almost always longer than the amount of time requires for the job to run.   Users will do this overestimating in order to add extra padding so PBS will not kill their job if the job runs too long.  Other users will refuse to submit jobs with a walltime resource due the destructive nature of it.   Due to overestimated walltimes, the scheduler's calendar is always a poor reflection of reality.  

This feature introduces a new resource called soft_walltime.  The scheduler will use the soft_walltime resource in place of the walltime resource when calculating the duration of a job.  The soft_walltime can only be set by a manager.  This is to avoid users from exploiting it.

An enumeration of places where soft_walltime and hard walltime are used:

When a job is queued

  • If the job is a top job, the soft_walltime will be used to determine when the job will fit onto the calendar.
  • If the job is a filler job, the soft_walltime will be used to see if the job will conflict with top jobs.
  • If the job is a filler job, the hard walltime will be used to see if the job will conflict with confirmed reservations.
  • If dedicated time is used, soft_walltime will be used to see if the job will finish before dedicated time starts.
  • If backfill_prime is set, soft_walltime is used to see if the job will finish before the next prime boundary + prime_spill.

When the job is running:

  • If resources_used.walltime <= soft_walltime, everything acts normally – nothing is wrong.
  • If resources_used.walltime > soft_walltime, the job has exceeded its soft_walltime.  The job is not killed.  The job's soft_walltime is extended
    • Every time the job exceeds its soft_walltime, it will be extended by 100% of its original soft_walltime.
    • If both a soft_walltime and a hard walltime are set, the soft_walltime will never be extended past the job's hard walltime.
    • If a job exceeds its soft_walltime and crosses over into dedicated_time, it is up to the admin to take action (if any).
    • The value of Resource_List.soft_walltime will not change.  The scheduler will set the estimated.soft_walltime value to the new soft_walltime estimate
  • When confirming reservations
    • PBS's existing behavior has not changed.
    • Only a job's hard walltime is used to determine when jobs end
    • A job's soft_walltime is not used.


Examples:

Job J has a soft_walltime=1:00:00 

  • When J exceeds its soft_walltime, J will be extended by its original soft_walltime to 2:00:00.
    • If J exceeds its soft_walltime again, J will be extended again by its soft_walltime to 3:00:00

Job K has a soft_walltime=1:00:00 and a hard walltime=1:30:00

  • When K exceeds its soft_walltime, it would have been extended to 2:00:00.  Since 2:00:00 is past its hard walltime, K is extended to 1:30:00 instead.


An STF job requesting soft_walltime will be rejected.  It is rejected because a job's min_walltime is the minimum amount of time a job needs to get any real work done.  A job's hard walltime can be set to its min_walltime.  A job's soft_walltime has to be shorter than its hard walltime.  This means that the soft_walltime would have to be shorter than the job's minimum amount of time to get any real work done.  The two features do not make sense together.


A job submitted with a cput resource is still limited by it.  No matter how many times a job exceeds its soft_walltime, it will be killed if it reaches its cput limit.


Setting soft_walltime from a hook:

Examples Setting soft_walltime:

Queuejob Hook
import pbs

e = pbs.event()
j = e.job

j.Resource_List["soft_walltime"] = pbs.duration("300")

pbs.logmsg(pbs.EVENT_DEBUG, "Set soft_walltime: %s" % j.Resource_List["soft_walltime"])


From qalter (as a manager):

qalter
% qalter -lsoft_walltime=1:00:00 <job id>


If a site wants to allow users to be able to set soft_walltime directly, they can achieve it through a simple queuejob hook.

First the admin creates a custom resource of type long with no flags.  For example, 'qmgr -c 'create resource set_soft_walltime type=long'

Second, the admin creates a simple queuejob hook that will copy the value of the new resource into soft_walltime.


Example of hook to allow users to directly set soft_walltime
import pbs
e = pbs.event()
j = e.job

j.Resource_List["soft_walltime"] = j.Resource_List["set_soft_walltime"]

Users then request the new resource in their job requests:

% qsub -l set_soft_walltime=1:00:00 -l select=1:ncpus=1


  • No labels