Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Overview:

Currently on server restart, job arrays that have running sub jobs are terminated due to them only being stored in memory. With this RFE the behavior is changed so that running subjobs continue to run after a server restart . It also enables storing the information that is unique to each subjob such as run_count, resources_used, comments, and hence qstat of the sub jobs does not return just the parent information once the job is finished.

Interface Design:

  • Interface 1:
    • Change control: Stable
    • Synopsis: Status of running Subjobs of Array Job persistent across pbs_server restarts 
    • Details:
      • Currently when a server restarts (peacefully or abruptly) any running subjobs of Array Job are killed and re-queued and start from beginning as the whole parent Array job is re-queued.
      • With this RFE we make subjob and array job status persistent across server restarts. So any running subjobs of Array Job continue to run when server is restarted
      • This is achieved by making a subjob is now made on par with a single job and storing each subjob's job object and its attributes into the pbs database which gets recovered during subsequent server start (pbsd_init())
      • "pbs.subjob_track" db Table is removed from pbs db schema


  • Interface 2:
    • Change control: Stable
    • Synopsis: Non-Rerunnable Array jobs
    • Details: 
      • Array Jobs can now be submitted with "-r n" command line argument to qsub.
      • i.e Array Jobs can now be Non-Rerunnable
      • i.e Restriction of Array Jobs to be Rerunnable always is removed.


  • Interface 3:
    • Change control: Stable
    • Synopsis: Impact on running sub jobs when terminating the pbs server using "qterm -t quick"
    • Details: 
      • When the PBS server is terminated with "qterm -t quick" (i.e type of shutdown requested is "quick"), any running subjob is not requeued and will be left running after the server shutdown.


  • Interface 4:
    • Change control: Stable
    • Synopsis: content of "qstat -xtf" with respect to subjob
    • Details: 
      • Now the contents displayed under command "qstat -xft" for each of the attributes / resources of subjobs are obtained from the corresponding subjob job object instead of copying from the parent array job obj
      • Hence it displays the information that is unique to each subjob such as run_count, resources_used, comments, and hence qstat of the sub jobs does not return just the parent information once the job is finished.


  • Interface 5:
    • Change control: Stable
    • Synopsis: Display of state of subjob in a finished array job under "qstat -xt"
    • Details: 
      • The status of a subjob in the a finished array job is shown as "X" instead of "F"  (i.e the status of subjob remain as "expired" instead of changing to "finished")
      • This change is a restriction discovered with experimental internal change


  • Interface 6:
    • Change control: Stable
    • Synopsis: "run_count" and "run_version" attributes for subjob is made valid
    • Details: 
      • Currently "run_count" and "run_version" attributes for subjob was never incremented and remained as "1" even after a rerun.
      • Now with this RFE "run_count" and "run_version" attributes for subjob will be incremented for each rerun of the subjob.



Site Map

Developer Guide Pages


  • No labels