Creating reservation out of a job

forum discussion

Pull Request

Overview

This design focuses on creating a reservation out of a job after the job has started running.

If a user can tell that their job is not going to go as they had hoped (prelim results from the solver, for example) they may want to end the job and re-submit after correcting the issues that caused the problem. Previously the resources allocated to the job would be released back to the server and other jobs could grab them. It could take a while for the user’s job to re-run depending on various factors like the number of jobs in the queue, priority, scheduling policy, etc. This RFE would enable the user to convert their job into a reservation, thus holding onto the resources it had been allocated. The user can now end their original job and submit an updated job to the new reservation without having to wait to re-run.

Technical Details

  1. Interface 1 - New '--job' option to pbs_rsub command.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: Allow users to create a reservation out of a running job.
    4. Details
      1. This command will create a reservation using the exec_vnodes of the job provided.
        1. Example - 
          1. [root@d_server /]# qstat -f 1.d_server | grep exec_vnode
            exec_vnode = (vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)
            [root@d_server /]#

          2. [root@d_server /]# pbs_rsub --job 1
            R2.d_server CONFIRMED

            [root@d_server /]#

          3. [root@d_server /]# pbs_rstat -f | grep resv_nodes
            resv_nodes = (vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)
            [root@d_server /]#

      2. This option can only be used for a job in state 'R' and substate 42.
        1. "request invalid for job state" will be displayed if the job is not in state R/42.
      3. The newly created reservation will be immediately confirmed as shown above.
      4. The walltime of the newly created reservation will be the same as that of the job.
      5. The start time of the newly created reservation will be copied from the job.
      6. The end time of the newly created reservation will be calculated from the start time and walltime.
      7. If the reservation is created after the job runs for a while, walltime of the reservation will be the walltime left out of the running job.
      8. soft walltime (if there) of the job would not be considered.
      9. Other attributes that will be copied from the job are - 

        Job
        Reservation
        Job_OwnerReserve_Owner
        schedselectschedselect
        exec_vnoderesv_nodes
        Resource_List.*Resource_List.*
      10. The reservation ID will be prefixed with 'R' as that of advance reservations.
      11. The reservation will be named R<next_available_id>.
      12. The job from which the reservation is created will be moved to the newly created reservation queue.
      13. An array job ID cannot be used with this new option.
      14. If the job is peer scheduled, the reservation will be created in the pulling complex.
  2. Interface 2: A new job attribute "create_resv_from_job"
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: Allow users to tell the server to create a reservation for a job at run time.
    4. Details: 
      1. This command will mark the job for creating a reservation out of it.
      2. This can be set at:
        1. Submit time
          1. by qsub/queuejob hook
        2. After submit time but before the job gets sent to the mom at runtime
          1. qalter
          2. server hooks (modifyjob/runjob/etc)
        3. While this attribute can be set after the job has run, doing so has no effect.
      3.  Type: Boolean
        1. Example:
          1. [root@d_server /]# qsub -Wcreate_resv_from_job=1 -- /bin/sleep 1111
            3016.d_server
            [root@d_server /]# qstat -s

            d_server:
            Req'd Req'd Elap
            Job ID             Username Queue Jobname SessID NDS TSK Memory Time S Time
            ------------------- -------------- --------  ------------ ---------- ------ ------ ----------- ------  -  -------
            3016.d_server root           R3017 STDIN     10824    1       1       --           --    R 00:00
            Job run at Fri Jan 10 at 22:58 on (d_server:ncpus=1)
            [root@d_server /]# pbs_rstat
            Resv ID Queue User State Start / Duration / End
            ---------------------------------------------------------------------
            R3017.d_se R3017 root@d_s RN Today 22:58 / 157680000 / Wed Jan 08 2025 2
            [root@d_server /]#

          2. Example showing creating a reservation out of a job is in the file hook demo.txt .
      4. Points 1.d.iii - 1.d.xii apply here as well.
  3. Interface 3: A new reservation attribute "reserve_job"
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: Allow users to identify if the reservation is created out of a job.
    4. Details:
      1. The value will be the job ID the reservation was made from
      2. Example:
        1. [root@d_server /]# pbs_rstat -f | grep job
          reserve_job = 3016.d_server
          [root@d_server /]#
  4. Interface 4: pbs_rsub error message when creating a reservation out of a reservation job.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
    4. Details:
      1. Example:
        1. [root@d_server /]# pbs_rsub --job 3016
          pbs_rsub: Reservation may not be created from a job already within a reservation.
          [root@d_server /]#
  5. Interface 5: pbs_rsub error message when creating a reservation out of an array job.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
    4. Details:
      1. Example:
        1. [root@d_server /]# pbs_rsub --job 3[]
          pbs_rsub: Reservation may not be created from an array job
          [root@d_server /]#
        2. [root@d_server /]# pbs_rsub --job 3[1]
          pbs_rsub: Reservation may not be created from an array job
          [root@d_server /]#
  6. Interface 6: qsub error message when submitting job to a reservation and using -Wcreate_resv_from_job=True.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
    4. Details:
      1. Example:
        1. [root@d_server pbspro_dev_oss]# pbs_rsub -R 2118 -E 2120
          R1.d_server UNCONFIRMED
          [root@d_server pbspro_dev_oss]# pbs_rstat
          Resv ID Queue User State Start / Duration / End
          ---------------------------------------------------------------------
          R1.d_serve R1 root@d_s CO Today 21:18 / 120 / Today 21:20
          [root@d_server pbspro_dev_oss]# qsub -q R1 -Wcreate_resv_from_job=1 -- /bin/sleep 1111
          qsub: Reservation may not be created from job in a reservation
          [root@d_server pbspro_dev_oss]#
  7. Interface 7: pbs_rsub error message when one user tries to create a reservation out of another user's job.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: A new error message indicating that creating a reservation out of another user's job is not allowed.
    4. Details:
      1. Example:
        1. [pbsuser1@d_server pbspro_dev_oss]$ qsub -- /bin/sleep 1111
          5.d_server
          [pbsuser1@d_server pbspro_dev_oss]$ qstat -s

          d_server:
          Req'd Req'd Elap
          Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
          --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
          5.d_server pbsuser1 workq STDIN 2874 1 1 -- -- R 00:00
          Job run at Fri Jan 17 at 21:28 on (d_server:ncpus=1)
          [pbsuser1@d_server pbspro_dev_oss]$ exit
          [root@d_server pbspro_dev_oss]# su pbsuser
          [pbsuser@d_server pbspro_dev_oss]$ pbs_rsub --job 5
          pbs_rsub: Unauthorized Request
          [pbsuser@d_server pbspro_dev_oss]$