mom execjob_begin hook alarms when running a large multi-node job

Description

In a customer case a job ~1000 node job was timing out the execjob_begin hook execution. It seems to be due to some inefficiency in how pbs_populate_svrattrl_from_python_class gets called, as illustrated in the mom log:

05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].name al_resc=null al_value=vn[654] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].name al_resc=null al_value=vn[654] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].name al_resc=null al_value=vn[446] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].name al_resc=null al_value=vn[654] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].name al_resc=null al_value=vn[446] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].name al_resc=null al_value=vn[762] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].name al_resc=null al_value=vn[654] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].name al_resc=null al_value=vn[446] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].name al_resc=null al_value=vn[762] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[557].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[557].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[557].name al_resc=null al_value=vn[557] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].name al_resc=null al_value=vn[654] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].name al_resc=null al_value=vn[446] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[762].name al_resc=null al_value=vn[762] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[557].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[557].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[557].name al_resc=null al_value=vn[557] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[853].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[853].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[853].name al_resc=null al_value=vn[853] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;pbs_populate_svrattrl_from_python_class==>
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].resources_assigned al_resc=mem,size al_value=0kb al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[654].name al_resc=null al_value=vn[654] al_flags=0
05/30/2017 23:32:31;0400;pbs_python;Hook;print_svrattrl_list;al_name=vn[446].resources_assigned al_resc=ncpus,long al_value=10 al_flags=0

...

It keeps doing that for each of the nodes in the job, which can take a long time even when the messages are not logged.

Acceptance Criteria

None

Status

Assignee

Al Bayucan

Reporter

Scott Campbell

Severity

3-High

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Fix versions

Affects versions

14.1.1

Priority

Critical
Configure