Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: hook is enabled on Cray X* series

Jira Legacy
serverJIRA (pbspro.atlassian.net)
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId32008a99-7831-3ff8-9638-3db0cd01164d
keyPP-610

Forum discussion (EDD review).

Overview:
PBS and ALPS can sometimes get out of sync. The purpose of the synchronization hook is to check to see if the information
that PBS has is out of sync with what ALPS is reporting. When the hook detects that PBS and ALPS are out of sync, the hook 
will restart HUP the Mom. The hook will only do its work on Cray X-series Moms.


Interface 1: PBS hook PBS_alps_inventory_check

  • Visibility: Public
  • Change Control: Experimental
  • Details: 
    • This is a periodic hook that runs on the execution host.
    • The Hook is not enabled by default. It enabled by default when run on a Cray X* series machine.
      • The hook is disabled by default on all other platforms.
    • The hook runs as the Administrator and executes every 300 seconds.
    • The timeout for the Hook is 90 seconds.
        

Interface 2: Mom log entry: ALPS Inventory Check: apstat command cannot be found at <path>

...

  • Visibility: PBS Private
  • Change Control: Experimental
  • Details: 
    • The mom installed on a login node reports inventory; additional moms, if any, do not.
    • The first instance of 'name' is the hostname of the login node responsible for performing the inventory query. The second 
      instance of 'name' is the hostname of the current/local mom.
    • Log level: PBSEVENT_ADMIN.

...

Interface 10: Mom log entry: ALPS Inventory Check: Compute node <name> (s) defined in ALPS, but not in PBS: <name><list of nodes>

  • Visibility: PBS Private
  • Change Control: Experimental
  • Details: 
    • Recorded when PBS and ALPS are out of sync i.e. ALPS has information that PBS does not have.
    • Log level: PBSEVENT_ADMIN.

...

Interface 11: Mom log entry: ALPS Inventory Check: Compute node <name> (s) defined in PBS, but not in ALPS: <name><list of nodes>

  • Visibility: PBS Private
  • Change Control: Experimental
  • Details: 
    • Recorded when PBS and ALPS are out of sync i.e. PBS reports nodes that ALPS does not.
    • Log level: PBSEVENT_ADMIN.

...

  • Visibility: PBS Private
  • Change Control: Experimental
  • Details: 
    • Recorded when the Hook is unable to restart HUP the Mom and successfully refresh nodes.
    • Log level: PBSEVENT_ADMIN.