Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

PP-832 - Getting issue details... STATUS

Forum Discussion

Issue Description:

Currently, in a failover setup, the secondary server at the time of taking over checks if it can communicate with the scheduler on the primary host. This check is done only at the time of take-over and never again.

If unable to communicate, the secondary server spawns a scheduler process on the secondary host, otherwise, it proceeds to use the scheduler on the primary host.

In the latter case, if after some time, the scheduler process on the primary host stops communicating (due to crash, host going down, etc...), there is no scheduler process to communicate with and scheduling halts.

The below proposed solutions are focused only on the default scheduler and does not cover the multi-sched scenario.

Solution:

Follow the below steps - 

  1. At the time of taking over, secondary server, spawns a local scheduler process.
  2. Secondary server sends SCH_SCHEDULE_FIRST to the scheduler.
  3. When PBS on primary comes up, the primary server should send SCH_SCHEDULE_FIRST to the local scheduler to re-read the usage data.
  • No labels