PP-578: server memory leak for hooks load test

PP-578 - Getting issue details... STATUS


Overview:

Ticket PP-228 - Getting issue details... STATUS was addressed by removing the code to restart the Python interpreter. This change introduced a significant memory leak when queuejob, modifyjob, and movejob hooks are present. This code will be restored in such a way that the administrator has more control and visibility over Python restarts. The change control is being defined as experimental because the long term plan is to address the memory leak in the pbs Python extension itself. Once completed, all of these interfaces become unnecessary.

There are three server attributes being added to control the frequency of interpreter restarts (documented below). The server maintains two counters that keep track of the number of hooks serviced and the number of Python objects created since the last restart. A restart is triggered when either of these counters exceeds the specified limits (python_restart_max_hooks or python_restart_max_objects) AND the specified time interval (python_restart_min_interval) has elapsed. A restart will NOT be triggered before the specified interval has elapsed. A restart will NOT be triggered if the interval has elapsed, but neither counter has exceeded its limit.

Interfaces:

  • Interface 1: python_restart_max_hooks
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: Add server attribute python_restart_max_hooks
    • Detail: In the past there were macros defined to control how often the Python interpreter was restarted in the server. Allow the administrator to override the hook count limit. The attribute may be set by a manager through qmgr. Default value is 100 when the server attribute is unset, as it was previously. Value assigned must be a positive integer, internally represented as a long.
  • Interface 2: python_restart_max_objects
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: Add server attribute python_restart_max_objects
    • Detail: In the past there were macros defined to control how often the Python interpreter was restarted in the server. Allow the administrator to override the object count limit. This attribute may be set by a manager through qmgr. Default value is 1000 when the server attribute is unset, as it was previously. Value assigned must be a positive integer, internally represented as a long.
  • Interface 3: python_restart_min_interval
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: Add server attribute python_restart_min_interval
    • Detail: This is server attribute that controls the minimum interval between Python restarts, expressed in seconds (pbs.duration). Default behavior is to avoid restarting the server more than once every 30 seconds when this attribute is unset. Administrators may increase this value at the cost of additional server memory usage. This attribute may be set by a manager through qmgr. Value assigned must be a positive integer (number of seconds) or a string of the format [[HH:]MM:]SS that gets converted to a long internally.
  • Interface 4: Log message
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: python_restart_max_hooks is now 100
    • Detail: A DEBUG3 message is being added to log the value of python_restart_max_hooks when it changes. The message will be printed to the server log when a server hook is called and the value has changed from the previous call. This may occur when the attribute is assigned a new value or is unset in qmgr.
  • Interface 5: Log message
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: python_restart_max_objects is now 1000
    • Detail: A DEBUG3 message is being added to log the value of python_restart_max_objects when it changes. The message will be printed to the server log when a server hook is called and the value has changed from the previous call. This may occur when the attribute is assigned a new value or is unset in qmgr.
  • Interface 6: Log message
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: python_restart_min_interval is now 30
    • Detail: A DEBUG3 message is being added to log the value of python_restart_min_interval when it changes. The message will be printed to the server log when a server hook is called and the value has changed from the previous call. This may occur when the attribute is assigned a new value or is unset in qmgr.
  • Interface 7: Log message
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: Restarting Python interpreter to reduce mem usage
    • Detail: A DEBUG2 message is being restored to indicate the Python interpreter is being restarted. The message will be printed to the server log. This message existed prior to this design document, but was never properly documented.
  • Interface 8: Log message
    • Visibility: Public
    • Change Control: Experimental
    • Synopsis: Current memory usage: VmSize=388200kB, VmRSS=98824kB
    • Detail: A DEBUG2 message is being added to log the contents of /proc/self/statm prior to server restart. The message will be printed to the server log. For systems not supporting /proc/self/statm, the string "unknown" will be printed in place of the memory information.



Site Map

Developer Guide Pages