Forum Post Link

Overview:

In order to employ backfilling, the scheduler requires jobs to be submitted with a walltime resource.  Walltime estimates are almost always longer than the amount of time requires for the job to run.   Users will do this overestimating in order to add extra padding so PBS will not kill their job if the job runs too long.  Other users will refuse to submit jobs with a walltime resource due the destructive nature of it.   Due to overestimated walltimes, the scheduler's calendar is always a poor reflection of reality.  

This feature introduces a new resource called soft_walltime.  The scheduler will use the soft_walltime resource in place of the walltime resource when calculating the duration of a job.



An enumeration of places where soft_walltime and hard walltime are used:

When a job is queued

When the job is running:

When confirming reservations:


Examples:

Job J has a soft_walltime=1:00:00 

Job K has a soft_walltime=1:00:00 and a hard walltime=1:30:00


An STF job requesting soft_walltime will be rejected.  It is rejected because a job's min_walltime is the minimum amount of time a job needs to get any real work done.  A job's hard walltime can be set to its min_walltime.  A job's soft_walltime has to be shorter than its hard walltime.  This means that the soft_walltime would have to be shorter than the job's minimum amount of time to get any real work done.  The two features do not make sense together.


A job submitted with a cput resource is still limited by it.  No matter how many times a job exceeds its soft_walltime, it will be killed if it reaches its cput limit.


Setting soft_walltime from a hook:

Examples Setting soft_walltime:

import pbs

e = pbs.event()
j = e.job

j.Resource_List["soft_walltime"] = pbs.duration("300")

pbs.logmsg(pbs.EVENT_DEBUG, "Setting soft_walltime: %s" % j.Resource_List["soft_walltime"])


From qalter (as a manager):

% qalter -lsoft_walltime=1:00:00 <job id>


If a site wants to allow users to be able to set soft_walltime directly, they can achieve it through a simple queuejob hook.

First the admin creates a custom resource of type long with no flags.  For example, 'qmgr -c 'create resource set_soft_walltime type=long'

Second, the admin creates a simple queuejob hook that will copy the value of the new resource into soft_walltime.


import pbs
e = pbs.event()
j = e.job

j.Resource_List["soft_walltime"] = pbs.duration(j.Resource_List["set_soft_walltime"])

Users then request the new resource in their job requests:

% qsub -l set_soft_walltime=1:00:00 -l select=1:ncpus=1