Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

https://pbspro.atlassian.net/browse/PP-864

...

  • Cray systems with a Gemini interconnect do NOT support suspend/resume
  • Cray systems with an Aries interconnect and newer Cray X* series systems DO support suspend/resume 
  • In order to do suspend/resume set suspendResume 1 in /etc/opt/cray/alps/alps.conf (using xtopview on CLE 5.2 and prior CLEs) and then restart ALPS
    • Please refer to Cray's System Administration Guide for more details about using suspend/resume on Cray X* series
  • On Cray X* series system PBS issues a request to ALPS to switch IN (resume) or OUT (suspend) an ALPS reservation
  • On a Cray X* series, the suspended low priority job and the high priority job must fit into the Cray compute node’s memory
  • On a Cray X* series systems have a limitation of having at maximum of 4 co-resident jobs on a compute node. Please read Cray documentation for more details.


Interface #1 - New error code when ALPS fails to switch reservation from suspend to resume or resume to suspend

...

  • Change Control: Stable
  • Details:
    • On a Cray X* series system a job that requests exclusive access (i.e. -lplace=excl) to a node can not be suspended. An error is thrown by ALPS while trying to switch out an exclusive job's ALPS reservation
    • If it is tried to be suspended mom returns with an error code 15219 and logs an DEBUG level error message as mentioned in 5th bullet of Interface 25


Interface #5 - New mom log messages

...