Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objective

This is to enhance PBS reporting of resources_used values, in particular, have MoM accumulate resources_used values that are set in a hook, whether builtin resource or custom resource.

Interface 1: For multi-node jobs, report accumulated resources_used values in accounting logs/qstat -f output, for those resources set in a hook.

  • Visibility: Public
  • Change Control: Stable
  • Synopsis: Display accumulated resources_used values in accounting logs and qstat -f output, for resources that are set in an execjob_prologue, execjob_epilogue, or exechost_periodic hook.
  • Details:
    • Resources_used resources 'cput', 'mem', 'cpupercent' will continue to be aggregated and reported as before.
    • The additional resources that can be accumulated are those that are set in a hook, which can be a builtin resource (e.g vmem), or a custom resource.

      • Builtin resource: If a builtin resource is set in a hook, then any polling done (if any) by MoM for its value will automatically be discontinued. The hook then becomes in charge of updating the value.

      • Custom resource: For a custom resource to be set in a hook, the resource must have already been added to PBS in one of 2 ways:

        1. Via qmgr:

          # qmgr -c "create resource <res_name> type=<res_type>,flag=h

        2. Via a mom exechost_startup hook as follows: 

          # qmgr -c "create hook start event=exechost_startup"
          # qmgr -c "import hook start application/x-python default start.py" 
          # qmgr -c "export hook start application/x-python default"
          import pbs
          e=pbs.event()
          localnode=pbs.get_local_nodename()

          e.vnode_list[localnode].resources_available['foo_i'] = 7
          e.vnode_list[localnode].resources_available['foo_f'] = 5.0
          e.vnode_list[localnode].resources_available['foo_str'] = "seventyseven"
          ,

    • Aggregation of values: The resource value collected in mother superior mom is aggregated with each of the values obtained from the sister moms whose nodes are part of the job.

    • For resources of type float, long, and size, the value will be reported in accounting logs and qstat -f as:

                          resources_used.<resource_name> = <summed total>      

      If for some reason a sister node did not report back the resources_used value for the resource, then the last know value will be used.

    • For resources of type string, the value is aggregated on a per-MOM basis, and put into a JSON format style and displayed in accounting_logs and qstat -f as follows:

                           resources_used.<resource_name> = {"<mom_host1>": <value>, "<mom_host2>": <value>, ...}                    

      If <value> begins with '{' and ends with '}' (curly braces for a Python dictionary or a JSON string), or begins with '[' and ends with ']' (a regular bracket for Python list), then it will be displayed

      .

      • The value obtained from each MOM must be a valid JSON object (a Python dictionary), which is an unordered set of name/value pairs, where each object begins with { (left brace) and ends with } (right brace). Each name is followed by: (colon) and the name/value pairs are separated by , (comma).  The name must be wrapped in double quotes allowing backslash escapes.

      • When all objects are found to be of valid JSON format, then the resulting string resource value would be  a merging (i.e. union) of all dictionary items, and shown in qstat -f and accounting_logs as is. Otherwise, the <value> will shown as a quoted string with all the embedded spaces, commas, brackets included. Ex.           resources_used.foo_assn2={"corretja":{"vn1":1,"vn2":2,"vn3":3},"nadal":{"vn1":4,"vn2":5,"vn3":6},"murray":[1, 2, 3, "5", "2", "!@#$%^&*()"]} :

        resources_used.<resource_name> = { <momA_JSON_item_value>, <momB_JSON_item_value>, <momC_JSON_item_value>, ..}

        Ex.   if momA returned '{ "a":1, "b":2 }', momB returned '{ "c":1 }', and momC returned '{"d":4}' for resources_used.foo_str, then we get:


        If  one or more moms did not report on that resource, the last known value sent by that mom will be used. If the mom has not reported a value at all, then the keyword 'null' will be reported as <value>. 

                                          resources_used.foo_str={"corretjaa": 1, "nineb": 2, "nadalc":"ten"1,"murrayd": "ten"}

      •  <mom_host1>, <mom_host2>, etc... will be in short hostname format (not FQDN).
      • 4}


        NOTE:If 2 or more values have the same 'name' as key, then one of them will be retained, which will depend on Python's operation of merging dictionary items.

      • When at least one of the values obtained from a sister MOM is not of JSON format, then the string cannot be accumulated, resulting in an unset resources_used string value. There'll be an error message in mom_logs that will be reported as follows:                                                                                                                          

         resources

        " Job <jobid> resources_used.

        <resource_name> = {"<mom_host1>": "<value>", "<mom_host2>":null, ...}

        <string_resource> cannot be accumulated: <exception_error_message>."

      • If it's a single node job, there'll be no accumulation of string resources. The value of the string resource need not be of JSON format.

Examples:

Given an epilogue hook that runs on all the mom nodes, setting different resources_used values based on whether executing on a MS mom or sister mom:

.#: qmgr -c "list hook epi"

Hook epi
type = site
enabled = true
event = execjob_epilogue
user = pbsadmin
alarm = 30
order = 1
debug = false
fail_action = none

# qmgr -c "e h epi application/x-python default"
import pbs
e=pbs.event()
pbs.logmsg(pbs.LOG_DEBUG, "executed epilogue hook")
if e.job.in_ms_mom(): #set in MS mom
    e.job.resources_used["vmem"] = pbs.size("9gb")
    e.job.resources_used["foo_i"] = 9
    e.job.resources_used["foo_f"] = 0.09
    e.job.resources_used["foo_str"] = '{"nine":9}'
    e.job.resources_used["cput"] = 10

    e.job.resources_used["foo_assn2"] = """:'{"vn1":1,"vn2":2,"vn3":3}"""'

else: # set in sister mom
    e.job.resources_used["vmem"] = pbs.size("10gb")
    e.job.resources_used["foo_i"] = 10
    e.job.resources_used["foo_f"] = 0.10
    e.job.resources_used["foo_str"] = '{"ten":10}'
    e.job.resources_used["cput"] = 20

    e.job.resources_used["foo_assn2"] = '{"vn4""[1, 2, 3, "5", "2", "!@#$%^&*()"]""":4,"vn5":5,"vn6":6}'

Now with 2 nodes: corretja (server/MS), and nadal:

Submit the following job:

% cat job.scr2
PBS -l select=2:ncpus=1
pbsdsh -n 1 hostname
sleep 300


% qsub job.scr2
102.corretja

When the job completes, the following resources_used values are shown:

With server job_history_enabled=true, one can check the values in a finished job:

% qstat -x -f 102

...

resources_used.cpupercent = 0
resources_used.cput = 00:00:30
resources_used.vmem = 19gb
resources_used.foo_f = 0.19
resources_used.foo_i = 19
resources_used.foo_str = {"corretjanine": "nine"9, "nadalten": "ten"10}

resources_used.foo_assn2={"corretja":{"vn1": 1, "vn2": 2 ,"vn3" :3 } ,"nadalvn4": [1, 2, 3, "5", "2", "!@#$%^&*()"]4, "vn5": 5, "vn6": 6}

resources_used.mem = 0kb
resources_used.ncpus = 2
resources_used.walltime = 00:00:05


NOTE: Those in bold show values accumulated between the MS value and the sister value. 

The accounting_logs show the same values:
8/03/2016 18:28:13;E;102.corretja;user=alfie group=users project=_pbs_project_default jobname=job.scr2 queue=workq ctime=1470263288 qtime=1470263288 etime=1470263288 start=1470263288 exec_host=corretja/0+nadal/0 exec_vnode=(corretja:ncpus=1)+(nadal:ncpus=1) Resource_List.ncpus=2 Resource_List.nodect=2 Resource_List.place=free Resource_List.select=2:ncpus=1 session=16986 end=1470263293 Exit_status=143 resources_used.cpupercent=0 resources_used.cput=00:00:30 resources_used.vmem=19gb resources_used.foo_f=0.19 resources_used.foo_i=19 resources_used.foo_str={"corretjanine": "nine"9, "nadalten": "ten"10 resources_used.foo_assn2={"corretja":{"vn1": 1, "vn2": 2 ,"vn3" :3 } ,"nadalvn4": [1, 2, 3, "5", "2", "!@#$%^&*()"]4, "vn5": 5, "vn6": 6} resources_used.mem=0kb resources_used.ncpus=2 resources_used.walltime=00:00:05 run_count=1

Now supposed that I change the execjob_epilogue hook to only set resources_used values from the MS mom:

# qmgr -c "e h epi application/x-python default"
import pbs
e=pbs.event()
pbs.logmsg(pbs.LOG_DEBUG, "executed epilogue hook")
if e.job.in_ms_mom():
    e.job.resources_used["vmem"] = pbs.size("9gb")
    e.job.resources_used["foo_i"] = 9
    e.job.resources_used["foo_f"] = 0.09
    e.job.resources_used["foo_str"] = '{"nine":9}'
    e.job.resources_used["cput"] = 10

Then submit the job and then deleting it to force execjob_epilogue hook execution, resulted in:

% qsub job.scr2
103.corretja


% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
103.corretja job.scr2 alfie 00:00:00 R workq


% qdel 103


% qstat -f -x 103
Job Id: 103.corretja
Job_Name = job.scr2
Job_Owner = alfie@corretja
resources_used.cpupercent = 0
resources_used.cput = 00:00:10
resources_used.vmem = 9gb
resources_used.foo_f = 0.09
resources_used.foo_i = 9
resources_used.foo_str = {"corretja":"nine",
"nadal":null9}
resources_used.mem = 0kb
resources_used.ncpus = 2
resources_used.walltime = 00:00:06

NOTE: Since it's this is a multinode job, then nadal reports 'null' for string values that were not updated by the sister mombut the sister mom did not report a value to 'foo_str', then only the mother superior mom value is aggregated as is.

Accounting logs show:
08/03/2016 18:36:14;E;103.corretja;user=alfie group=users project=_pbs_project_default jobname=job.scr2 queue=workq ctime=1470263768 qtime=1470263768 etime=1470263768 start=1470263768 exec_host=corretja/0+nadal/0 exec_vnode=(corretja:ncpus=1)+(nadal:ncpus=1) Resource_List.ncpus=2 Resource_List.nodect=2 Resource_List.place=free Resource_List.select=2:ncpus=1 session=17114 end=1470263774 Exit_status=143 resources_used.cpupercent=0 resources_used.cput=00:00:10 resources_used.vmem=9gb resources_used.foo_f=0.09 resources_used.foo_i=9 resources_used.foo_str={"corretja":"nine","nadal": null9} resources_used.mem=0kb resources_used.ncpus=2 resources_used.walltime=00:00:06 run_count=1