https://pbspro.atlassian.net/browse/PP-864
...
- Cray systems with a Gemini interconnect do NOT support suspend/resume
- Cray systems with an Aries interconnect and newer Cray X* series systems DO support suspend/resume
- In order to do suspend/resume set suspendResume 1 in /etc/opt/cray/alps/alps.conf (using xtopview on CLE 5.2 and prior CLEs) and then restart ALPS
- Please refer to Cray's System Administration Guide for more details about using suspend/resume on Cray X* series
- On Cray X* series system PBS issues a request to ALPS to switch IN (resume) or OUT (suspend) an ALPS reservation
- On a Cray X* series, the suspended low priority job and the high priority job must fit into the Cray compute node’s memory
- On a Cray X* series systems have a limitation of having at maximum of 4 co-resident jobs on a compute node. Please read Cray documentation for more details.
Interface #1 - New error code when ALPS fails to switch reservation from suspend to resume or resume to suspend
...
Interface #5 - New mom log messages
- Change Control: StableUnstable
- Details:
- Following mom log message is logged on Cray X* series systems when
- While switching out an exclusive ALPS reservation (suspending a job with exclusive placement) following error is logged (PBSEVENT_DEBUG)
"BASIL;ERROR: ALPS error: apsched: at least resid <ALPS reservation id> is exclusive"
- While switching out an exclusive ALPS reservation (suspending a job with exclusive placement) following error is logged (PBSEVENT_DEBUG)
- Following mom log message is logged on Cray X* series systems when