Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Jira Legacy
serverSystem JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId32008a99-7831-3ff8-9638-3db0cd01164d
keyPP-832

...

The below proposed solutions are focused only on the default scheduler and does not cover the multi-sched scenario.

Solution 1:

Follow the below steps - 

  1. At the time of taking over, secondary server checks if it can communicate with the scheduler on primary host.
  2. If able to communicate, proceeds to use the scheduler on the primary host.
  3. If not, spawn , spawns a local scheduler process.
  4. While the secondary is active and the scheduler on the primary goes down, the secondary server will spawn a local scheduler.
  5. The PBS init script should always restart the scheduler on the primary host.

Solution 2:

Follow the below steps - 

  1. In a failover setup, have the fairshare "usage" file on shared filesystem.
  2. At the time of taking over, secondary server checks if it can communicate with the scheduler on primary host.
  3. If able to communicate, it sends SCH_QUIT signal to the scheduler on primary and then spawn a local scheduler process.
  4. If not, spawn a local scheduler processSecondary server sends SCH_SCHEDULE_FIRST to the scheduler.
  5. When PBS on primary comes up, the primary server should send SCH_SCHEDULE_FIRST to the local scheduler to re-read the usage data.