Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Objective

This is to enhance PBS reporting of resources_used values, in particular, have MoM accumulate resources_used values that are set in a hook, whether builtin resource or custom resource.

Interface 1: For multi-node jobs, report accumulated resources_used values in accounting logs/qstat -f output, for those resources set in a hook.

  • Visibility: Public
  • Change Control: Stable
  • Synopsis: Display accumulated resources_used values in accounting logs and qstat -f output, for resources that are set in an execjob_prologue, execjob_epilogue, or exechost_periodic hook.
  • Details:
    • Resources_used resources 'cput', 'mem', 'cpupercent' will continue to be aggregated and reported as before.
    • The additional resources that can be accumulated are those that are set in a hook, which can be a builtin resource (e.g vmem), or a custom resource.

      • Builtin resource: If a builtin resource is set in a hook, then any polling done (if any) by MoM for its value will automatically be discontinued. The hook then becomes in charge of updating the value.

      • Custom resource: For a custom resource to be set in a hook, the resource must have already been added to PBS in one of 2 ways:
        1. Via qmgr:

          # qmgr -c "create resource <res_name> type=<res_type>,flag=h

           

        2. Via a mom exechost_startup hook as follows: 

          # qmgr -c "create hook start event=exechost_startup"
          # qmgr -c "import hook start application/x-python default start.py" 
          # qmgr -c "export hook start application/x-python default"
          import pbs
          e=pbs.event()
          localnode=pbs.get_local_nodename()

          e.vnode_list[localnode].resources_available['foo_i'] = 7
          e.vnode_list[localnode].resources_available['foo_f'] = 5.0
          e.vnode_list[localnode].resources_available['foo_str'] = "seventyseven"
          e.vnode_list[localnode].resources_available['stra'] = "pears"
          ,

    • Aggregation of values: The resource value collected in mother superior mom is aggregated with each of the values obtained from the sister moms whose nodes are part of the job.

    • For resources of type float, long, and size, the value will be reported in accounting logs and qstat -f as:

                          resources_used.<resource_name> = <summed total>      

      If for some reason a sister node did not report back the resources_used value for the resource, then the last know value will be used.

    • For resources of type string and string_array, the value is aggregated into a JSON format style, as follows:

                           resources_used.<resource_name> = {"<node1>": "<str_val>", "<node2>": "<str_val>", ...}

                           NOTE: The quotes are included to disambiguate embedded spaces, commas and brackets.

      If  one or more moms did not report on that resource, the last known value sent by that mom will be used. If the mom has not reported a value at all, then the keyword 'None' will be reported as <str_val>.

                           resources_used.<resource_name> = {"<node1>": "<str_val>", "<node2>": None, ...}

 

  • No labels