We're updating the issue view to help you get more done. 

As a user or admin, deleting large numbers of jobs takes too long with scheduling enabled

Description

One aspect of the overall qdel performance problem is:

1) It takes much longer to delete a running job than one that's in the queue, because a lot of MoM and server<->MoM activity happens when you terminate a running job (as well as .o/.e staging).

2) Deletion of a job causes the server to cycle the scheduler, and the scheduler hits the server pretty hard at the start of a cycle. When multiple jobids are given to a single qdel command the server initiates a scheduling cycle after each one is processed (because pbs_deljob() only operates on a single job and qdel simply loops through calls to it).

3) Since the scheduler is constantly cycling it may tell the server to start the jobs you're trying to delete. This not only make the server busier than otherwise (which means deletejob requests have to wait to get processed until the server is done talking to the scheduler at the start of the sched cycle and starting the possibly-already-doomed jobs), it also means the deletions themselves will take much longer.

Turning scheduling off to delete large chunks of jobs is not an acceptable at many customer sites, nor is it an option for a normal user who wishes to delete their own jobs.

When deleting more than one jobs with a single command the server should wait until the entire job list has been processed before initiating a scheduling cycle for the deletion events.

Some possible ideas:

  • build a list of jobs to be deleted when users do something like "qselect -u jon | xargs qdel" or "for x in {6543..6789};do qdel $x;done" then delete them all at once

  • Exclude job listed for deletion (i.e. qdel 1 2 3 4 5 6 ... ) from being passed to the scheduler by the server

  • possible add new flag to qdel that will queue the jobs for deletion instead of waiting for each job to be deleted

Acceptance Criteria

The ability to delete jobs from multiple servers in a single qdel command (e.g. qdel 1.server1 1.server2) is preserved

Scheduler does not schedule jobs that are slated to be deleted

Scheduler/Server is not blocked when jobs are being deleted

Status

Assignee

Unassigned

Reporter

Scott Campbell

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Story Points

400

Components

Priority

Low