This is a design proposal to configure PBS in a way that it releases only limited number of resources (as specified by the admin) when a job is suspended.

PBS in its current form releases all the consumable resources requested by the job when it is suspended. But, in reality when system is out of swap space, a suspended job's process holds on to the memory it would have consumed and just releases ncpus (because kernel stops the process), in some cases admin might have configured an alternate suspend signal which would make the job release a few resources (like licenses) upon suspension. Therefore, it would be better if PBS has a way for admins to specify what all resources can be released from a job upon suspension.

Link to forum discussion.

Interface 1: New server attribute to specify which resources can be released.

# qmgr -c "s s restrict_res_to_release_on_suspend = ‘ncpus, abcd'"

   qmgr obj=abcd svr=default: Unknown resource

   qmgr: Error (15035) returned from server


Interface 2: New Job attribute “resources_released”

A new job attribute “resources_released” is added.

This attribute is of type string and can only be read by operator/manager. This attribute is internally set by server when a job is suspended. Python type of this attribute is string.

It stores a string that depicts the amount of resources that are released on each node that the job was running on (provided these resources are also part of “restrict_res_to_release_on_suspend” string). The format of the string is similar to that of exec_vnode

example: qstat -f 1 | grep resources_released 

                        resources_released = (host1:ncpus=2)+(host2:ncpus=4:license=2)

This job attribute is populated at the time of job suspension only if “restrict_res_to_release_on_suspend” server attribute is set and has a list of legitimate resources to be released.

This attribute is set by server whenever it suspends a job. 

Interface 3: New Job attribute “resource_released_list”

A new job attribute “resource_released_list” is added.

This attribute is of type “resource_list” and can only be read by operator/manager. This attribute is internally set by server when a job is suspended. Python type of this attribute is pbs_resource.

It stores the cumulative value of all the consumable resources requested by the job (provided these resources are also part of “restrict_res_to_release_on_suspend” string).

using example in interface 2: qstat -f 1 | grep resource_released_list

         resource_released_list.license = 2

         resource_released_list.ncpus = 6

This job attribute is populated only if “restrict_res_to_release_on_suspend” server attribute is set and has a list of legitimate resources to be released.

This job attribute is used to release consumable resources on queue/server objects.


Interface 4: New server log message

Unable to create resource released list


Interface 5: New error message while deleting a custom resource

example: # qmgr -c “d r res1”

   qmgr obj=res1 svr=default: Resource busy on server

   qmgr: Error (15174) returned from server