Uploaded image for project: 'PBS Pro'
  1. PP-479

As an admin, I would like running subjobs to be able to survive a pbs_server restart, so that the work up to that point is not lost

    Details

    • Type: User Story
    • Status: In Progress
    • Priority: Low
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      1
    • Acceptance Criteria:
      Running jobs are not requeued because of pbs_server process restart.

      Description

      Currently on server restart, job arrays that have running sub jobs are terminated due to them only being stored in memory. I would like for this behavior to be changed so that running subjobs continue to run after a server restart. It would also be great if we could store the information that is unique to each subjob such as run_count, resources_used, comments, etc so that a queue of the sub jobs does not return just the parent information once the job is finished.

      Also please consider

      Some of the attributes that we should consider making unique are
      resources_used.cpupercent = 1716
      resources_used.cput = 26:01:06
      resources_used.mem = 34733656kb
      resources_used.ncpus = 15
      resources_used.vmem = 34733656kb
      resources_used.walltime = 01:54:51
      Error_Path =
      exec_host =
      exec_vnode =
      mtime = Wed Nov 8 20:31:47 2017
      Output_Path =
      stime = Wed Nov 8 18:36:56 2017
      session_id = 27561
      substate = 42
      comment = Job run at Wed Nov 08 at 18:36 on
      etime = Wed Nov 8 18:36:56 2017
      run_count = 1
      array_index =

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                shrinivas.harapanahalli Shrinivas Harapanahalli
                Reporter:
                scc Scott Campbell
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: