PP-1110: Support overlay upgrade on Cray Linux Environment (CLE) 6.0

The community discussion can be found at: http://community.pbspro.org/t/pp-1110-support-for-overlay-upgrade-on-cray-linux-environment-cle-6-0/775

PP-1110 - Getting issue details... STATUS


Overlay upgrade when installed version is 13.0.40*

Prior to PBS Pro version 18.x, PBS Pro used a script named INSTALL to install and configure the RPMs.  The INSTALL script provided support for multiple versions simultaneously, using a symbolic link (/opt/pbs/default) to select the “active” version on the system.  As of version 18, the INSTALL script is no longer supported.  The new package allows the administrator to install and upgrade PBS  as they would any other RPM based package.

The following steps assume PBS_HOME is already persistent.

-       Drain the system of running jobs.  Queued jobs will be retained.

-       Remove the existing vnodes
(This is because prior to 18.x the vnodes on a Cray were created one vnode per NUMA node, but from 18.x one vnode per compute node will be created by PBS Pro.  So all older vnodes will be marked as stale and may be difficult to remove later.)

# qmgr -c “delete node @default”

-       Shut down PBS Pro on all the nodes where a PBS Pro daemon is running.

# /etc/init.d/pbs stop


Follow the PBS Pro version 18.x on CLE 6.0 install instructions, use the same configuration set as what was used with the prior PBS Pro installation that made PBS_HOME and /etc/pbs.conf persistent.  

In those instructions, at step 5, be sure that PBS_EXEC=/opt/pbs and PBS_HOME is set to the path that was set before the upgrade.

Because PBS_HOME is already persistent, it is not necessary to follow step 9 which pertains to making PBS_HOME persistent.
 

Overlay upgrade when installed version is 18.2 or higher

The following instructions assume PBS_HOME is already persistent.

-       Drain the system of running jobs.  Queued jobs will be retained.

-       Shut down PBS Pro on all the nodes where a PBS Pro daemon is running.
# /etc/init.d/pbs stop

Overlay upgrade will use many of the same IMPS commands and steps as are outlined in the PBS Pro CLE 6.0 install instructions and referred to in Cray's S-2559 documentation.  For the steps below, use the original repo, pkgcoll, and recipes that were used for the currently installed PBS Pro.  

  1. Follow Cray's instructions for updating a repository (Refer to the section on "Install Third-Party Software with a Custom Image" in the Software Installation and Configuration Guide in Cray document S-2559).  Use the same repo that was used for the existing installed image.  
    For example, if the repo was called my_repo the command might look something like:
    smw# repo update -a "./pbspro-server-18.2.0.20171109010636-0.x86_64.rpm" my_repo
  2. Follow Cray's instructions for updating a package collection in Cray document S-2559.
    1. For example if the original package collection was called my_collection the command might look something like:
      smw# pkgcoll update -p pbspro-server-18.2.0.20171109010636-0.x86_64 my_collection
       
    2. Use the pkgcoll show to see what's in the package collection my_collection.  For example:

      smw# pkgcoll show my_collection
      my_collection:
      name: my_collection
      packages:
      pbspro-server-18.2.0.20171018010818-0.x86_64
      pbspro-server-18.2.0.20171109010636-0.x86_64

    3. Use the package collection update remove option to remove the older PBS Pro version.  For example:
      smw# pkgcoll update -P pbspro-server-18.2.0.20171018010818-0.x86_64 my_collection

    4. Using the package collection show command should now only show one PBS Pro RPM version.

  3. Even though changes have been made to the repo and the package collection the original recipe should still validate at this point.
  4. Follow Cray's instructions (in Cray document S-2559) to build and push/package the image using the original recipe.  Use a new image name in order to keep the original image untouched.  For example, if the original recipe name was my_recipe and the original image created was my_recipe_image, then the new image might be my_new_image and the command might look something like:
    smw# image create -r my_recipe my_new_image
    smw# image export my_new_image 
        a.  Do this for each type of image that is needed (e.g. server, login, etc.)
  5.  Follow Cray's instructions (in Cray document S-2559) to assign the new boot image to the nodes where PBS Professional should be installed.  This includes all nodes running PBS Professional services (e.g. pbs_server, pbs_mom, etc.) and those requiring access to PBS Pro commands (e.g. qsub, qstat, etc.).  Include the original configuration set that contains the persistent information.  For example the command might look something like:
    smw# cnode update -i  /var/opt/cray/imps/boot_images/<image name>.cpio -c <configuration set name> <cname-of-node-to-update or group-to-update>
  6. Reboot the system.
  7. Log on to the nodes that are hosting PBS Professional daemons and start PBS.
    # /etc/init.d/pbs start
  8. At this point PBS Professional will be up and running, and connected to ALPS.  Prior queued jobs will still be queued.  
    Remember to turn on scheduling so PBS Pro can resume normal operation.




Site Map

Developer Guide Pages