Uploaded image for project: 'PBS Pro'
  1. PP-832

PBSPro failover secondary server fails to continuously check whether it needs to start a scheduler locally

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Low
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 19.1.1
    • Component/s: None
    • Labels:
      None

      Description

      We've noticed that if you stop the primary server (but not the scheduler) after failover the secondary keeps using the scheduler on the primary.

      But if you then reboot the primary, the secondary will not start a scheduler locally, since it only decides whether it needs a local scheduler once, immediately after it has taken over, and not in the main server loop.

      You end up with a secondary that will never schedule at all. Of course customers consider that failover mechanism "broken".

      the sequence to trigger this:

      -qterm -t quick on primary
      -Secondary takes over, schedules using primary scheduler
      -Kill the primary's scheduler.

      What does work:
      -qterm -t quick -s on the primary (or /etc/init.d/pbs stop, or reboot, or yanking the power,...)
      -Secondary takes over and on initial attempt to use the scheduler and failure, decides to start local scheduler

      We should, on EVERY failure to contact the scheduler in the main server loop, consider to start a new scheduler locally if we see that we are the secondary and that we were using the scheduler on the primary.

      Actually, it would be better to simply always start a scheduler on the secondary and connect to that, even though that is not behaviour according to the documentation. It will always be a faster scheduler, since it is only going to be used if we're the active server.

      Critical, since it can lead to situations in which a failover server doesn't correctly take over scheduling services.

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                prakashcv13 Prakash Varandani
                Reporter:
                smgoosen Sam Goosen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: