Record for Job Suspend and Resume events in Accounting Logs

Overview:

There are many events in the lifecycle of a job. This EDD is focused on logging accounting records related to suspend and resume events of the job. Currently, job's resource usage is available when the job ends and there is "E" record in the accounting logs for the same. The resource requested/usage can change during the life of the job. The objective of this EDD is to understand the correct usage of resources during the suspend and resume events. 

'z' record:

  • Upon job suspension, a 'z' record shall be accounted. 
  • The record shall consist of:
    • resources_released(if available)
    • resources_used

Example:

1. resources_released list shall only be available if the server attribute restrict_res_to_release_on_suspend is set.
"qmgr -c 's s restrict_res_to_release_on_suspend+=ncpus'"

a. Submit a job requesting ncpus

04/04/2020 00:56:06;z;1003.pbsserver;resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=2 resources_used.vmem=0kb resources_used.walltime=00:00:00 resources_released=(pbsserver:ncpus=2)

b. Submit a job requesting ncpus and memory

04/04/2020 00:55:12;z;1002.pbsserver;resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3432kb resources_used.ncpus=1 resources_used.vmem=336452kb resources_used.walltime=00:00:32 resources_released=(pbsserver:ncpus=1)


2. restrict_res_to_release_on_suspend is unset

04/04/2020 00:52:45;z;1002.pbsserver;resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=1 resources_used.vmem=0kb resources_used.walltime=00:00:00

'r' record

  • Upon resuming a job, an 'r' record shall be accounted.

Example:

04/04/2020 00:54:43;r;1002.pbsserver;