Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The pbs_deljob() IFL call normally returns back to the caller immediately after the server has received the request and started the delete process.  Unlike pbs_deljob(), if jobs are to be deleted, the server will not return back to a pbs_preempt_jobs() call until all the jobs have been fully preempted.  This means if a job is to be deleted,  the server will wait until the job is truly deleted before returning.  This is because the scheduler needs the jobs to be out of the way before it starts the high priority job.  If pbs_preempt_jobs() returned sooner, the scheduler would oversubscribe the nodes until jobs were finished being deleted.

Preemption is done via the pbs_preempt_jobs() IFL call.  This call just tells the server to preempt the job.  The server will then use the preempt_order attribute to determine the correct preemption method to use.  Once the job is preempted, the scheduler will get a response back telling it what method was used.  We'll add a new method 'D' to the response back to the scheduler.


On the server side, we will create an internal batch request to delete the job.  This is similar to what would happen if a qdel happened, but it is coming from inside the server.  This will delete the job. 


We can't ack the original pbs_preempt_jobs() request when the delete job requests are finished.  The jobs are not deleted until after the obit is returned from the mom and end of job processing is finished.  Since the obit comes from the mom, we have no access to the initial preq.  This means we'll have to keep the preq around on the job.  This is similar to how a pbs_rerun request works.  When end of job processing is finished and the job is purged, reply_preempt_jobs_request() will be called on the saved preq to add the job to the list to return back to the scheduler.  Once all jobs have called reply_preempt_jobs_request(), the server will return back to the scheduler.

Advice:

It is unwise to use a runjob hook with preemption via deletion.  This means the high priority job can have its run request rejected.  If this happens we'll have deleted jobs for no reason.

...