Uploaded image for project: 'PBS Pro'
  1. PP-439

As a user or admin, deleting large numbers of jobs takes too long with scheduling enabled

    Details

    • Type: User Story
    • Status: Open
    • Priority: Low
    • Resolution: Unresolved
    • Affects versions: None
    • Fix versions: None
    • Components: commands, Hooks, Server
    • Labels:
      None
    • Sprint:
    • Story Points:
      400
    • Acceptance Criteria:
      Hide
      The ability to delete jobs from multiple servers in a single qdel command (e.g. qdel 1.server1 1.server2) is preserved

      Scheduler does not schedule jobs that are slated to be deleted

      Scheduler/Server is not blocked when jobs are being deleted
      Show
      The ability to delete jobs from multiple servers in a single qdel command (e.g. qdel 1.server1 1.server2) is preserved Scheduler does not schedule jobs that are slated to be deleted Scheduler/Server is not blocked when jobs are being deleted

      Description

      One aspect of the overall qdel performance problem is:

      1) It takes much longer to delete a running job than one that's in the queue, because a lot of MoM and server<->MoM activity happens when you terminate a running job (as well as .o/.e staging).

      2) Deletion of a job causes the server to cycle the scheduler, and the scheduler hits the server pretty hard at the start of a cycle. When multiple jobids are given to a single qdel command the server initiates a scheduling cycle after each one is processed (because pbs_deljob() only operates on a single job and qdel simply loops through calls to it).

      3) Since the scheduler is constantly cycling it may tell the server to start the jobs you're trying to delete. This not only make the server busier than otherwise (which means deletejob requests have to wait to get processed until the server is done talking to the scheduler at the start of the sched cycle and starting the possibly-already-doomed jobs), it also means the deletions themselves will take much longer.

      Turning scheduling off to delete large chunks of jobs is not an acceptable at many customer sites, nor is it an option for a normal user who wishes to delete their own jobs.

      When deleting more than one jobs with a single command the server should wait until the entire job list has been processed before initiating a scheduling cycle for the deletion events.

      Some possible ideas:

      • build a list of jobs to be deleted when users do something like "qselect -u jon | xargs qdel" or "for x in {6543..6789}

        ;do qdel $x;done" then delete them all at once

      • Exclude job listed for deletion (i.e. qdel 1 2 3 4 5 6 ... ) from being passed to the scheduler by the server
      • possible add new flag to qdel that will queue the jobs for deletion instead of waiting for each job to be deleted

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                scc Scott Campbell
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: