Skip to end of metadata
Go to start of metadata


Objective

This is to enhance PBS reporting of resources_used values, in particular, have MoM accumulate resources_used values that are set in a hook, whether builtin resource or custom resource.

Interface 1: For multi-node jobs, report accumulated resources_used values in accounting logs/qstat -f output, for those resources set in a hook.

  • Visibility: Public
  • Change Control: Stable
  • Synopsis: Display accumulated resources_used values in accounting logs and qstat -f output, for resources that are set in an execjob_prologue, execjob_epilogue, or exechost_periodic hook.
  • Details:
    • Resources_used resources 'cput', 'mem', 'cpupercent' will continue to be aggregated and reported as before.
    • The additional resources that can be accumulated are those that are set in a hook, which can be a builtin resource (e.g vmem), or a custom resource.

      • Builtin resource: If a builtin resource is set in a hook, then any polling done (if any) by MoM for its value will automatically be discontinued. The hook then becomes in charge of updating the value.

      • Custom resource: For a custom resource to be set in a hook, the resource must have already been added to PBS in one of 2 ways:

        1. Via qmgr:

          # qmgr -c "create resource <res_name> type=<res_type>,flag=h

        2. Via a mom exechost_startup hook as follows: 

          # qmgr -c "create hook start event=exechost_startup"
          # qmgr -c "import hook start application/x-python default start.py" 
          # qmgr -c "export hook start application/x-python default"
          import pbs
          e=pbs.event()
          localnode=pbs.get_local_nodename()

          e.vnode_list[localnode].resources_available['foo_i'] = 7
          e.vnode_list[localnode].resources_available['foo_f'] = 5.0
          e.vnode_list[localnode].resources_available['foo_str'] = "seventyseven"
          ,

    • Aggregation of values: The resource value collected in mother superior mom is aggregated with each of the values obtained from the sister moms whose nodes are part of the job.

    • For resources of type float, long, and size, the value will be reported in accounting logs and qstat -f as:

                          resources_used.<resource_name> = <summed total>      

      If for some reason a sister node did not report back the resources_used value for the resource, then the last know value will be used.

    • For resources of type string, the value is aggregated on a per-MOM basis.

      • The value obtained from each MOM must be a valid JSON object (a Python dictionary), which is an unordered set of name/value pairs, where each object begins with { (left brace) and ends with } (right brace). Each name is followed by: (colon) and the name/value pairs are separated by , (comma).  The name must be wrapped in double quotes allowing backslash escapes.

      • When all values are found to be of valid JSON format, then the resulting string resource value would be  a merging (i.e. union) of all dictionary items, and shown in qstat -f and accounting_logs as:

        resources_used.<resource_name> = { <momA_JSON_item_value>, <momB_JSON_item_value>, <momC_JSON_item_value>, ..}

        Ex.   if momA returned '{ "a":1, "b":2 }', momB returned '{ "c":1 }', and momC returned '{"d":4}' for resources_used.foo_str, then we get:


                                          resources_used.foo_str='{"a": 1, "b": 2, "c":1,"d": 4}'


        NOTE:If 2 or more values have the same 'name' as key, then one of them will be retained, which will depend on Python's operation of merging dictionary items. It is recommended for hook writers to make the keys unique, and this can be done by using the pbs.get_local_nodename() value as part of the key.

      • When at least one of the values obtained from a sister MOM is not of JSON format, then the string cannot be accumulated, resulting in an unset resources_used string value. There'll be an error message in mom_logs that will be reported as follows:                                                                                                                          " Job <jobid> resources_used.<string_resource> cannot be accumulated: value <input value> from mom <hostname> not JSON-format: <exception_error_message>."

Examples:

Given an epilogue hook that runs on all the mom nodes, setting different resources_used values based on whether executing on a MS mom or sister mom:

.#: qmgr -c "list hook epi"

Hook epi
type = site
enabled = true
event = execjob_epilogue
user = pbsadmin
alarm = 30
order = 1
debug = false
fail_action = none

# qmgr -c "e h epi application/x-python default"
import pbs
e=pbs.event()
pbs.logmsg(pbs.LOG_DEBUG, "executed epilogue hook")
if e.job.in_ms_mom(): #set in MS mom
    e.job.resources_used["vmem"] = pbs.size("9gb")
    e.job.resources_used["foo_i"] = 9
    e.job.resources_used["foo_f"] = 0.09
    e.job.resources_used["foo_str"] = '{"nine":9}'
    e.job.resources_used["cput"] = 10

    e.job.resources_used["foo_assn2"] = '{"vn1":1,"vn2":2,"vn3":3}'

else: # set in sister mom
    e.job.resources_used["vmem"] = pbs.size("10gb")
    e.job.resources_used["foo_i"] = 10
    e.job.resources_used["foo_f"] = 0.10
    e.job.resources_used["foo_str"] = '{"ten":10}'
    e.job.resources_used["cput"] = 20

    e.job.resources_used["foo_assn2"] = '{"vn4":4,"vn5":5,"vn6":6}'

Now with 2 nodes:

Submit the following job:

% cat job.scr2
PBS -l select=2:ncpus=1
pbsdsh -n 1 hostname
sleep 300


% qsub job.scr2
102.corretja

When the job completes, the following resources_used values are shown:

With server job_history_enabled=true, one can check the values in a finished job:

% qstat -x -f 102

...

resources_used.cpupercent = 0
resources_used.cput = 00:00:30
resources_used.vmem = 19gb
resources_used.foo_f = 0.19
resources_used.foo_i = 19
resources_used.foo_str = '{"nine": 9, "ten": 10}'

resources_used.foo_assn2='{"vn1": 1, "vn2": 2 ,"vn3": 3 ,"vn4": 4, "vn5": 5, "vn6": 6}'

resources_used.mem = 0kb
resources_used.ncpus = 2
resources_used.walltime = 00:00:05


NOTE: Those in bold show values accumulated between the MS value and the sister value. 

The accounting_logs show the same values:
8/03/2016 18:28:13;E;102.corretja;user=alfie group=users project=_pbs_project_default jobname=job.scr2 queue=workq ctime=1470263288 qtime=1470263288 etime=1470263288 start=1470263288 exec_host=corretja/0+nadal/0 exec_vnode=(corretja:ncpus=1)+(nadal:ncpus=1) Resource_List.ncpus=2 Resource_List.nodect=2 Resource_List.place=free Resource_List.select=2:ncpus=1 session=16986 end=1470263293 Exit_status=143 resources_used.cpupercent=0 resources_used.cput=00:00:30 resources_used.vmem=19gb resources_used.foo_f=0.19 resources_used.foo_i=19 resources_used.foo_str='{"nine": 9, "ten": 10}'  resources_used.foo_assn2='{"vn1": 1, "vn2": 2 ,"vn3": 3 ,"vn4": 4, "vn5": 5, "vn6": 6}' resources_used.mem=0kb resources_used.ncpus=2 resources_used.walltime=00:00:05 run_count=1

Now supposed that I change the execjob_epilogue hook to only set resources_used values from the MS mom:

# qmgr -c "e h epi application/x-python default"
import pbs
e=pbs.event()
pbs.logmsg(pbs.LOG_DEBUG, "executed epilogue hook")
if e.job.in_ms_mom():
    e.job.resources_used["vmem"] = pbs.size("9gb")
    e.job.resources_used["foo_i"] = 9
    e.job.resources_used["foo_f"] = 0.09
    e.job.resources_used["foo_str"] = '{"nine":9}'
    e.job.resources_used["cput"] = 10

Then submit the job and then deleting it to force execjob_epilogue hook execution, resulted in:

% qsub job.scr2
103.corretja


% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
103.corretja job.scr2 alfie 00:00:00 R workq


% qdel 103


% qstat -f -x 103
Job Id: 103.corretja
Job_Name = job.scr2
Job_Owner = alfie@corretja
resources_used.cpupercent = 0
resources_used.cput = 00:00:10
resources_used.vmem = 9gb
resources_used.foo_f = 0.09
resources_used.foo_i = 9
resources_used.foo_str = '{"nine":9}'
resources_used.mem = 0kb
resources_used.ncpus = 2
resources_used.walltime = 00:00:06

NOTE: Since this is a multinode job, but the sister mom did not report a value to 'foo_str', then only the mother superior mom value is aggregated as is.

Accounting logs show:
08/03/2016 18:36:14;E;103.corretja;user=alfie group=users project=_pbs_project_default jobname=job.scr2 queue=workq ctime=1470263768 qtime=1470263768 etime=1470263768 start=1470263768 exec_host=corretja/0+nadal/0 exec_vnode=(corretja:ncpus=1)+(nadal:ncpus=1) Resource_List.ncpus=2 Resource_List.nodect=2 Resource_List.place=free Resource_List.select=2:ncpus=1 session=17114 end=1470263774 Exit_status=143 resources_used.cpupercent=0 resources_used.cput=00:00:10 resources_used.vmem=9gb resources_used.foo_f=0.09 resources_used.foo_i=9 resources_used.foo_str='{"nine": 9}' resources_used.mem=0kb resources_used.ncpus=2 resources_used.walltime=00:00:06 run_count=1


NOTE: In hooks, when specifying values to string_array type of resources (e.g. Resource_List, resources_used), if value is enclosed in single quotes ('), then  everything inside the single quotes are taken in as the value. For example,


pbs.event().job.Resource_List["foo_stra"] = '"glad,elated"'  ← note of the outer single quote 'foo_stra' resource will have a value of "glad,elated"

This change was needed in order to be consistent with Python's handling of string literal values so as to align with json string processing.

Interface 2: For single-node jobs, report json string resources_used values in accounting logs/qstat -f output within single quotes, for those resources set in a hook.


  • Visibility: Public
  • Change Control: Stable
  • Synopsis: Display accumulated resources_used values in accounting logs and qstat -f output, for resources that are set in an execjob_prologue, execjob_epilogue, or exechost_periodic hook.
  • Details:
    • In case of a single node job, for  the resources of type string following would be applicable:

      • If the value obtained from the MOM is a valid JSON object (a Python dictionary),  then the resulting string resource value would be  shown in qstat -f and accounting_logs within single quote as:

        resources_used.<resource_name> = '{ <mom_JSON_item_value>, <mom_JSON_item_value>, <mom_JSON_item_value>, ..}

        Ex.   if mom returned { "a":1, "b":2, "c":1,"d": 4} for resources_used.foo_str, then we get:

                                          resources_used.foo_str='{"a": 1, "b": 2, "c":1,"d": 4}'


      • If the value obtained from MOM is not a valid JSON, then it is reported as is in qstat -f and accounting logs.

        Ex.   if mom returned "hello"  for resources_used.foo_str, then we get:

                                          resources_used.foo_str="hello"

  • Community Discussion


  • No labels

15 Comments

  1. A few comments:

    (1) Awesome examples – thanks for including these.  It makes it really easy to see what is being proposed.

    (2) Having node names appear differently in different parts of the output seems inconsistent.  For example, in the last accounting log example, exec_host and exec_vnode use "corretja", while the new resources_used JSON format shows "correjta.pbspro.com".  It seems using either the short or full name in both places would be more consistent.

    (3) For aggregating strings and string_arrays, the design for string/string array aggregation references nodes (in the example with "<node1>": "<str_val>") and also MOMs (as in "If one or more moms did not report"), but it doesn't define how chunks relate to nodes and to MOMs.  Is the value a per-chunk, per-node, or per-MOM value? And, if it's not per-chunk (or per-node), is there a way to express a per-chunk value that will show up sensibly in the JSON output?  (There is no formatting issue for numeric values, as it doesn't matter when summing up all the values.)

    (4) It would be useful to see an example of aggregating a string_array where each "node" has multiple values, e.g., on node1 foo="a,b,c" and on node2 foo="c,d,e", etc.

    (5) When aggregating values, it would be really nice to be able to choose to report numeric values (float, long, and size) in the JSON format style too.  For example, there are many numeric "per node" values that would be nice to have reported "per node", e.g., current/max power use, maximum memory use.

    Thanks!

  2. (2) it makes sense to use the short name.

    (3) each response is from the mom. As it stands there is no way to get the data on a per chuck basis

    (4) It is left up to the hook writer which string they would like to return. So for example if the hook writer wanted to return something for node1 like foo="a,b,c" it is possible. However, I believe we need to consider the case where the hook writer would like to return a dictionary or a list (i.e. foo={"a":1,"b":4,"c":6}

    (5) If the admin would like to see these results then they would need to create a string resource and set the numerical values in the string. This would allow them to get I believe what you are looking for. Does this make sense, or am I missing something?

  3. Thanks, Bill, for the comments! My response:

    (2) I agree they should be consistent.

    (3) It's reported on a per-MOM basis. The hook writer has the flexibility of returning a string or string_array resources_used value that captures per chunk/vper node value. So something like the hook script would have:

    pbs.event().job.resources_used['foo_assigned"] = "vn1:1,vn2:2,vn3:3"   or maybe like a dictionary as in:

    pbs.event().job.resources_used["foo_assigned"] = "{\"vn1\":1,\"vn2\":2,\"vn3\":3}"             NOTE: I need to test this out.

    (4) Let me test this and give you the output.

    (5) As Jon said, could use the string or string_array route...

  4. Anonymous

    More response to point (4), actually I have it already in the example:

    resources_used.stra={"corretja.pbspro.com":"broccoli,tomatoes","nadal.pbspro.com":"carrots,onions"}


    where string_array 'stra'  showed: "corretja" reported multiple values "broccoli,tomatoes" while nadal reported "carrots,nions".

     

  5. More response to point (4), actually I have it already in the example:

    resources_used.stra={"corretja.pbspro.com":"broccoli,tomatoes","nadal.pbspro.com":"carrots,onions"}


    where string_array 'stra'  showed: "corretja" reported multiple values "broccoli,tomatoes" while nadal reported "carrots,nions".

     

  6. Thanks.  Three comments left with respect to (3):

    3a.  The design is based on aggregating values per-MOM, but mentions "node". In PBS, "node" often means "vnode". I suggest updating the design text to make it more clear it is per-MOM (and not necessarily per-node).

    3b. Ideally, there'd be a way to return a single JSON value (in the accounting logs) that is structured as one entry per chunk.  Is there any way to do this?  I feel like this is a highly desired behavior, e.g., to easily display MPI process data.  If it's not possible, this puts an extra burden on both the hook writer (to extract per-chunk values) and then again on anyone parsing the the output to extract the chunks from the MOMs.  It looks like the best one can do is something like an array (one entry per-MOM) of dictionaries (one dictionary item per-chunk).

    3c.  As a note, returning a dictionary is best done using a string (a string_array would be a bad choice).  Is it even necessary (or desirable) to support string_arrays?

     

  7. Bill Nitzberg: My response to your new comments:

     

    3a. Agree, I'll update the design doc to be explicit about per-MOM aggregation.

     

    3b. Let me see, server assigns resources to satisfy each chunk, and passed to the job and to the mom via the exec_vnode attribute. exec_vnode would have the value: "<chunk1 resources>+<chunk2 resources>+...+<chunkN resources>". <chunk1 resources> and <chunk2 resources> could all be in the same MOM via vnodes assignment as in: "vn1:foo_val=a" + "vn2:foo_val=b".  This feature is about having resources_used values accumulate among the different moms, with those values specifically set in a hook by a hook writer. The hook writer is given the facility to do a per resources assignment:

    pbs.event().job.resources_used["foo_val"] = ...

    So upon aggregation, mom has to parse through the exec_vnode attribute value looking for "foo_val", and relate it to the appropriate chunk. But "foo_val" appears as used in chunk1 via "vn1" and  chunk2 via "vn2" (in my example)  which are both managed by the reporting MOM. So question arise as to which chunk to correlate it too. So I think it will be up to the hook writer to make the association by using a string or string_array resources_used value, and display the format in JSON form.

     

     

     

  8. Just a reminder. We should be doing these comments on the forum instead of the wiki. Please move your comments to the forum

  9. 3c. Jon and I talked to. I think it's best to just support the "string" aggregation and not "string_array". It keeps things simple.

  10. Jon Shelley – I suggest finishing this discussion, then moving it to the "right" place and/or deleting it as appropriate.

    OK, I think the only unresolved comment is 3b.  Let me rephrase what I was asking...

    Imagine a cluster with lots of nodes, each with lots of cpus, and where plenty of jobs are running (and sharing) the system.  A user submits a job with:

    qsub -l select=4:ncpus=1 -lplace=free

    Now, PBS might run this job on one, two, three, or four MOMs, depending on what else is happening on the system.  What would be ideal, is if it were possible for a clever hook writer to ensure that the aggregated output is independent of the number of MOMs, e.g., the desired output is something like:

    foo_val="{'rank0':'broccoli','rank1':'tomatoes','rank2':'carrots','rank3':'onions'}"

    Assuming the hook writer can get the per-rank (per-chunk) data – that's their responsibility, so assume they can do it – is it possible, how could they pass that data to PBS to get the above output in all cases (with 1, 2, 3, and 4 MOMs)?

    Thanks again!

  11. Anonymous

    Bill,

    It could be don as follows:

     

    Given a job running on nodes from 4 moms:

     

    umomA hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli"}"""       ← trying triple quotes to preserve raw text.
    momB hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank1":"tomatoes"}"""

    momC hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank2":"carrots"}"""

    momD hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank3":"onions"}"""

    And when job completes, we'll get in the accounting_logs:

    resources_used.foo_val = "{"momA":{"rank0":"broccoli"},"momB":{"rank1":"tomatoes"},"momC":{"rank2":"carrots"}, "momD":{"rank3":"onions"}}"

     

    Given a job running on nodes from 3 moms, we can have the scenario:

     

    momA hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli"}"""       ← trying triple quotes to preserve raw text.
    momB hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank1":"tomatoes"}"""

    momC hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank2":"carrots","rank3":"onions"}"""

    And when job completes, we'll get in the accounting_logs:

    resources_used.foo_val = "{"momA":{"rank0":"broccoli"},"momB":{"rank1":"tomatoes"},"momC":{"rank2":"carrots","rank3":"onions"}}

     

    Given a job running on nodes from 2 moms, we can have the scenario:

    momA hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli","rank1":"tomatoes"}""" ← trying triple quotes to preserve raw text.
    momB hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank2":"carrots","rank3":"onions"}"""
    "

    And when job completes, we'll get in the accounting_logs:
    resources_used.foo_val = "{"momA":{"rank0":"broccoli","rank1":"tomatoes"},"momB":{"rank2":"carrots","rank3":"onions"}}

     

    Gven a job running on 1 node:

    momA hook:

    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli","rank1":"tomatoes","rank2":"carrots","rank3":"onions"}""" ← trying triple quotes to preserve raw text.

     

    And when job completes, we'll get in the accounting_logs:

    resources_used.foo_val = "{"momA":{"rank0":"broccoli","rank1":"tomatoes","rank2":"carrots","rank3":"onions"}}

     


    ~

  12. Bill,

    It could be don as follows:

     

    Given a job running on nodes from 4 moms:

     

    umomA hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli"}"""       ← trying triple quotes to preserve raw text.
    momB hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank1":"tomatoes"}"""

    momC hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank2":"carrots"}"""

    momD hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank3":"onions"}"""

    And when job completes, we'll get in the accounting_logs:

    resources_used.foo_val = "{"momA":{"rank0":"broccoli"},"momB":{"rank1":"tomatoes"},"momC":{"rank2":"carrots"}, "momD":{"rank3":"onions"}}"

     

    Given a job running on nodes from 3 moms, we can have the scenario:

     

    momA hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli"}"""       ← trying triple quotes to preserve raw text.
    momB hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank1":"tomatoes"}"""

    momC hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank2":"carrots","rank3":"onions"}"""

    And when job completes, we'll get in the accounting_logs:

    resources_used.foo_val = "{"momA":{"rank0":"broccoli"},"momB":{"rank1":"tomatoes"},"momC":{"rank2":"carrots","rank3":"onions"}}"

     

    Given a job running on nodes from 2 moms, we can have the scenario:

    momA hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli","rank1":"tomatoes"}""" ← trying triple quotes to preserve raw text.
    momB hook:
    pbs.event().job.resources_used['foo_val'] = """{"rank2":"carrots","rank3":"onions"}"""
    "

    And when job completes, we'll get in the accounting_logs:
    resources_used.foo_val = "{"momA":{"rank0":"broccoli","rank1":"tomatoes"},"momB":{"rank2":"carrots","rank3":"onions"}}"

     

    Gven a job running on 1 node:

    momA hook:

    pbs.event().job.resources_used['foo_val'] = """{"rank0":"broccoli","rank1":"tomatoes","rank2":"carrots","rank3":"onions"}""" ← trying triple quotes to preserve raw text.

     

    And when job completes, we'll get in the accounting_logs:

    resources_used.foo_val = "{"momA":{"rank0":"broccoli","rank1":"tomatoes","rank2":"carrots","rank3":"onions"}}"

     


    ~

  13. Jon and I discussed the following:

    • The method a hook writer uses to avoid key name collisions must be well documented with examples.
    • I have proposed that we add <JSON></JSON> markups to the string as an indicator that the enclosed string contains JSON formatted data. This will make it easier to identify JSON formatted data within overloaded string elements. It also allows us to encapsulate other formats if we decide to adopt them in the future. Without these markups, it would be difficult to distinguish between other formats that might resemble JSON.

    Jon has agreed that the second item should be added if it can be achieved in the remaining time allotted for this feature.

  14. In an ideal world, PBS would introduce a new type (with new semantics) to cover the use case of returning multiple values, especially for parallel jobs.  For example, a future idea would be to support JSON native types, and specifically, a JSON object type that aggregates via "union" as well as a JSON array type that aggregates via concatenation.  With native JSON types, the PBS plugin framework would be able to also verify types (and throw appropriate exceptions) directly in the hooks (and lots of other "good stuff").

    Due to real-world constraints (time and effort) a trade-off design has been proposed that overloads the existing PBS string resource type.  However, the thought is to make it behave (from within hooks) as if it was a JSON object type.  That means having it be 100% JSON syntax.  And, to support flexibility for future designs, the edge cases (cases that do not support the main use case) are defined as erroneous, i.e., not returning a valid JSON object is an error (and results in an error in the logs and no aggregated output).

    I suggest not adding the <JSON></JSON>.  Adding <JSON></JSON> to the strings means the "type" would no longer be JSON, but would be XML wrapped JSON...which I feel goes too far (in the direction of flexibility**).  It also does two things I don't like in designs:  it puts type information in the data stream (when there is already type metadata) and It burdens every hook writer with the task of unpacking and verifying the type.  I feel standardizing on JSON objects (for now) is a good tradeoff, with the possibility to add support for any JSON value in the future.

    As to the issue with having key collisions, I suggest one of two solutions (for this edge case):  either make it erroneous (and log an error and produce zero output) as with non JSON objects, or do not define which of the multiple keys wins (as in, "if there are multiple matching keys, only one of the key:value pairs will be included in the aggregation, but it is undefined which one").

    1. Thanks, Bill. The current EDD is in line with the points you made. For key collisions, I opted for the second: do not define which of the multiple keys wins