Skip to end of metadata
Go to start of metadata

Community discussion is located here:  http://community.pbspro.org/t/pp-702-tests-for-installation-and-upgrades-on-cray-x-series-cle-5-2-systems/508

Overview:

These are the tests to verify the installation or upgrade of PBS on Cray X-series CLE 5.2 systems.


Pre:

    Use a real Cray machine having CLE 5.2. Use the sdb as the host for the PBS server, sched, and comm. Use login nodes as the host for the PBS MoM.

    Uninstall PBS if currently installed using steps in UNINSTALLATION section below before running INSTALLATION and UPGRADE sections.


1. INSTALLATION


  1.1. Install PBS Pro using the default data service account pbsdata.


    1.1.1 Determine the NID of the node that will run the PBS Pro server and scheduler. In this example, the sdb node will be used.

      boot# ssh sdb cat /proc/cray_xt/nid

      5


    1.1.2 If not existing, then create user pbsdata on the server host. 

      For example:

      boot# xtopview -n 5 -e "useradd -c '#Altair' -d /home/users/pbsdata -g 14901 -m -u 12796 pbsdata"

      Note: The user id and group id above are examples. Use appropriate values that apply to your system to avoid non-existent or conflicting group id and conflicting value user id.


    1.1.3 Follow the fresh install instructions below using pbsdata as the dataservice user.

      The message below should not be seen and it would be an error if the message comes up during installation:

      "NOTE: /etc/pbs.conf and the PBS_HOME directory must be deleted manually".


      1.1.3.1. Login to the boot node and create the /rr/current/software/pbspro directory if it is not present.


      1.1.3.2. Copy the PBS Pro server RPM to the /rr/current/software/pbspro directory on the boot node. Ensure it is the only RPM file present in the directory.


      1.1.3.3. Set the compute nodes to batch:

        boot# xtprocadmin -k m batch


      1.1.3.4. Install the PBS Pro RPM by running the following command on the boot node. Adjust the value of PBS_SERVER to the name of the node where the scheduler/server/comm services will run. For a fresh install run:


        boot# xtopview -d /rr/current/software/pbspro -m "Installing PBS Pro" -e "PBS_SERVER=sdb rpm -i /mnt/pbspro-server-*.rpm"


      1.1.3.5. Check to ensure PBS_SERVER is set correctly in /etc/pbs.conf and edit if necessary. All PBS Pro services should be disabled.


        boot# xtopview -e "cat /etc/pbs.conf"

        PBS_EXEC=/opt/pbs

        PBS_SERVER=sdb

        PBS_START_SERVER=0

        PBS_START_SCHED=0

        PBS_START_COMM=0

        PBS_START_MOM=0

        PBS_HOME=/var/spool/pbs

        PBS_CORE_LIMIT=unlimited

        PBS_SCP=/usr/bin/scp


      1.1.3.6. Update /etc/pbs.conf for the login nodes. Set the value of PBS_START_MOM to 1.

        boot# xtopview -c login -e "xtspec /etc/pbs.conf"

        boot# xtopview -c login -e "vi /etc/pbs.conf"


        ***File /etc/pbs.conf was MODIFIED

        boot# xtopview -c login -e "cat /etc/pbs.conf"

        PBS_EXEC=/opt/pbs

        PBS_SERVER=sdb

        PBS_START_SERVER=0

        PBS_START_SCHED=0

        PBS_START_COMM=0

        PBS_START_MOM=1

        PBS_HOME=/var/spool/pbs

        PBS_CORE_LIMIT=unlimited

        PBS_SCP=/usr/bin/scp


      1.1.3.7. Determine the NID of the node that will run the PBS Pro server and scheduler. In this example, the sdb node will be used.

        boot# ssh sdb cat /proc/cray_xt/nid

        5


      1.1.3.8. Use the value returned as the argument to the -n parameter and update the /etc/pbs.conf settings for the PBS Pro server.

        The values of PBS_START_SERVER, PBS_START_SCHED, and PBS_START_COMM should all be set to 1.


        boot# xtopview -n 5 -e "xtspec /etc/pbs.conf"

        boot# xtopview -n 5 -e "vi /etc/pbs.conf"


        ***File /etc/pbs.conf was MODIFIED

        boot# xtopview -n 5 -e "cat /etc/pbs.conf"

        PBS_EXEC=/opt/pbs

        PBS_SERVER=sdb

        PBS_START_SERVER=1

        PBS_START_SCHED=1

        PBS_START_COMM=1

        PBS_START_MOM=0

        PBS_HOME=/var/spool/pbs

        PBS_CORE_LIMIT=unlimited

        PBS_SCP=/usr/bin/scp


      1.1.3.9. The PBS Pro dataservice runs on the PBS Pro server node. The processes must be owned by an account other than root. (e.g.pbsdata, postgres, etc.).

        Start PBS by:


        boot# ssh sdb

        === Welcome to sdb ===

        sdb# /etc/init.d/pbs start


      1.1.3.10. Start the PBS Pro service on each execution host.


        sdb# ssh nid00030 /etc/init.d/pbs start


      1.1.3.11. Enable flatuid on the server:


        sdb# qmgr -c "set server flatuid = true"


      1.1.3.12. Install the approriate PBS Pro license. For the open souce release, it is not necessary to configure a license.

        For the commercial release, define the pbs_license_info parameter via qmgr. For example:


        sdb# qmgr -c "set server pbs_license_info = 6200@licenseserver"


      1.1.3.13. With licensing now configured, restart PBS Pro on the sdb node.


        sdb# /etc/init.d/pbs restart


      1.1.3.14. Configure the execution hosts on the server. Create a node in PBS Pro for each login/service node that will be running pbs_mom.


        sdb# qmgr -c "create node nid00030"


        Installation is complete.


  1.2  Optional: After installation of PBS MoMs add the line below to the PBS_HOME/mom_priv/config file of each MoM node:

      $usecp *:/home /home


    and HUP the MoM after changing PBS_HOME/mom_priv/config:

      login# pkill -HUP pbs_mom


  1.3 After installation, follow the steps in the POST UPGRADE section.


  1.4. Submit jobs

    Follow the job submission steps in JOBS section.


  1.5. Cleanup

    Uninstall PBS using steps in UNINSTALLATION section.


  1.6. Repeat the steps 1.1.3 to 1.5 above using the user crayadm as the data service account.

    In step 1.1.3.9 start PBS Pro for the first time by:


    boot# ssh sdb

    === Welcome to sdb ===

    sdb# PBS_DATA_SERVICE_USER=crayadm /etc/init.d/pbs start



2. UPGRADE


  2.1. Install 17.2.x and then overlay upgrade to 17.2.y (where y > x).

    Use pbsdata as the dataservice account.


    2.1.1 Install 17.2.x using the installation procedure in section 1.1.


    2.1.2 Follow the overlay upgrade instructions below.


      The message below should not be seen and it would be an error if the message comes up during upgrade:

      "NOTE: /etc/pbs.conf and the PBS_HOME directory must be deleted manually".


      2.1.2.1 Login to the boot node and create the /rr/current/software/pbspro

        directory if it is not present.


      2.1.2.2 Copy the new PBS Pro server RPM to the /rr/current/software/pbspro on the boot node. Ensure it is the only RPM file present in the directory.


      2.1.2.3 Drain the system of running jobs. Queued jobs will be retained.


      2.1.2.4 Shut down PBS Pro on all login nodes.


        login# /etc/init.d/pbs stop


      2.1.2.5 Shut down PBS Pro on the server.


        sdb# /etc/init.d/pbs stop



      2.1.2.6 On the boot node, run:


        boot# xtopview -d /rr/current/software/pbspro -m "Upgrading PBS Pro" -e "rpm -U /mnt/pbspro-server-*.rpm"


      2.1.2.7 Start the PBS Pro server.


        sdb# /etc/init.d/pbs start


      2.1.2.8 Start the PBS Pro mom(s).


        login# /etc/init.d/pbs start


    2.1.3 After upgrade, login to the server and MoM nodes and:


      2.1.3.1 Perform the checks in the POST UPGRADE section.


      2.1.3.2 Check that PBS_HOME in /etc/pbs.conf is still /var/spool/pbs

          sdb# grep PBS_HOME /etc/pbs.conf

          login# grep PBS_HOME /etc/pbs.conf


      2.1.3.3 Check that /var/spool/pbs exists and is populated, as root:

          sdb# ls -R /var/spool/pbs

          login# ls -R /var/spool/pbs


      2.1.3.4 Submit jobs as shown in JOBS section.


      2.1.3.5 Clean up PBS installation using steps in UNINSTALLATION section.



  2.2 Install 13.0.40x and then overlay upgrade to 17.2.x or higher.

    Use crayadm as the dataservice account.


    2.2.1 Install 13.0.40x using the installation procedure in the PBS Pro 13.0 Installation Guide.

      Add the lines below to the PBS_HOME/mom_priv/config on each MoM node:

        $alps_client /opt/cray/alps/default/bin/apbasil

        $usecp *:/home /home  → optional


      and HUP the MoM after changing the PBS_HOME/mom_priv/config:

        login# pkill -HUP pbs_mom


    2.2.2 Follow the overlay upgrade instructions below.


      The message below should not be seen and it would be an error if the message comes up during upgrade:

      "NOTE: /etc/pbs.conf and the PBS_HOME directory must be deleted manually".


      Prior to version 17, PBS Pro used a script named INSTALL to install and configure the RPMs. The INSTALL script provided

      support for installing multiple versions simultaneously, using a symbolic link (/opt/pbs/default) to select the

      "active" version on the system. As of version 17, the INSTALL script is no longer supported. The new package

      allows the administrator to install and upgrade PBS Pro as they would any other RPM based package.


      Due to the significant packaging changes, it is recommended that the administrator uninstall the old version of PBS Pro

      prior to installing the new version. Uninstalling the old version after installing the new version will prevent PBS

      Pro from starting automatically at boot. The /etc/pbs.conf file and contents of the PBS_HOME directories will not be

      affected when PBS Pro is uninstalled.


      2.2.2.1 Follow the steps in uninstallation section 5.1.


        expected:

        PBS Pro is now completely uninstalled, but the /etc/pbs.conf and PBS_HOME directories remain. Leave these files in place and follow

        the steps to perform a fresh install of PBS Pro in section 1.7.


    2.2.3 After the upgrade, login to the server and MoM nodes and:


      2.2.3.1 Perform the checks in the POST UPGRADE section below.


      2.2.3.2 Check that PBS_HOME in /etc/pbs.conf is still /var/spool/PBS

          sdb# grep PBS_HOME /etc/pbs.conf

          login# grep PBS_HOME /etc/pbs.conf


      2.2.3.3 Check that /var/spool/PBS exists and is populated, as root:

          sdb# ls -R /var/spool/PBS

          login# ls -R /var/spool/PBS


      2.2.3.4 Submit jobs as shown in JOBS section.


      2.2.3.5 Clean up PBS installation using steps in UNINSTALLATION section.



3. POST UPGRADE

    Login to the server and MoM nodes and check that:


    3.1 Check that PBS_EXEC in /etc/pbs.conf is /opt/pbs

       sdb# grep PBS_EXEC /etc/pbs.conf

       login# grep PBS_EXEC /etc/pbs.conf


    3.2 Check that /opt/pbs exists and is populated, as root:

      sdb# ls -R /opt/pbs

      login# ls -R /opt/pbs


    3.3 PATH includes /opt/pbs/bin

      sdb# echo $PATH | grep pbs | grep -v grep

      login# echo $PATH | grep pbs | grep -v grep


        expect to find /opt/pbs/bin


    3.4 MANPATH includes /opt/pbs/man

      sdb# echo $MANPATH | grep pbs | grep -v grep

      login# echo $MANPATH | grep pbs | grep -v grep


        expect to find /opt/pbs/man


    3.5 'module' includes pbs.

      sdb# module list

      login# module list


        expect to find information about the pbs module.


    3.6 PBS init script is intact

      sdb# ls -l /etc/init.d/pbs

      login# ls -l /etc/init.d/pbs


        expect to find the pbs init script


    3.7 PBS is enabled in chkconfig.

      sdb# chkconfig pbs

      login# chkconfig pbs


        expect pbs to be 'on'


    3.8 login to the MoM hosts and check that

      PBS_HOME/mom_priv/config contains these lines:

        $vnodedef_additive 0

        $alps_client /opt/cray/alps/default/bin/apbasil

        $usecp *:/home /home  → optional



4. JOBS

  Submit jobs as regular user such as crayadm from the login node.


  4.1 Configuration for MoMs, on each node running a MoM:

    Add the line below to PBS_HOME/mom_priv/config if not there:

        $usecp *:/home /home  –> optional


    and HUP the MoM after changing PBS_HOME/mom_priv/config:


      login# pkill -HUP pbs_mom


  4.2 As a regular user, request for Cray compute node in a job.


    login$ qsub -l select=1:ncpus=1:vntype=cray_compute

      aprun -B sleep 10

      ^D

    login$ apstat -rn

    login$ qstat -f


    expect that:

      - There is a reservation for a compute node. For example, for NID 28 below we see that it has ApId 26949 in State "conf,claim".


      login$ apstat -rn

      NID Arch State CU Rv Pl  PgSz     Avl   Conf Placed PEs Apids

        2   XT UP  B  8  -  -    4K 8388608      0      0   0

        <...snip...>

       28   XT UP  B  8  1  1    4K 8388608 262144 262144   1 26949

       29   XT UP  B  8  -  -    4K 8388608      0      0   0

      Compute node summary

          arch config     up   resv    use  avail   down

            XT     24     24      1      1     23      0


        ResId  ApId From    Arch PEs N d Memory State

        49096 26948 batch:2   XT   1 1 1   1024 NID list,conf,claim

      A 49096 26949 batch:2   XT   1 - -   1024 conf,claim


      - The job's exec_vnode is on a compute node.

      - The job terminates normally without errors.


  4.3 request for Cray login node

    login$ qsub -l select=1:ncpus=1:vntype=cray_login

      sleep 10

      ^D

    login$ apstat -rn


    login$ qstat -f


      expect that:

      - There are no reservations on the compute nodes. For example,

      login$ apstat -rn

      NID Arch State CU Rv Pl  PgSz      Avl Conf Placed PEs Apids

        8   XT UP  B 72  -  -    4K 25165824    0      0   0

        <...snip...>

      106   XT UP  B 72  -  -    4K 27262976    0      0   0

      107   XT UP  I 72  -  -    4K 27262976    0      0   0

      Compute node summary

          arch config     up   resv    use  avail   down

            XT     69     69      0      0     69      0


      No resource reservations are present


      - The job's exec_vnode is on a login node.

      - The job terminates normally without errors.


  4.4 submit an interactive job

    login$ qsub -I -l select=1:ncpus=1:vntype=cray_compute


      inside interactive job:

        - type 'hostname' and 'aprun /bin/hostname'

        - expect that different hostnames are returned

      The steps below may be done inside the interactive job or outside the interactive job (e.g. in another terminal):

        - Check 'apstat -rn' output.

            expect that there is a reservation for a compute node

            (see section 4.2 for an example).

        - Check 'qstat -f' output.

            expect that job's exec_vnode is on a compute node.


      Exit out of the interactive job and check 'apstat -rn' output.

      Expect that there are no reservations on any compute node.

      (see section 4.3 for an example).



5. UNINSTALLATION


  5.1 Shutdown and uninstall PBS on the server and MoM nodes


    5.1.1 Drain the system of jobs, from sdb node

      sdb# qdel `qselect`

        expect that all the jobs are gone.


    5.1.2 Shutdown PBS Pro on all login nodes.

      login# /etc/init.d/pbs stop


    5.1.3 Shutdown PBS Pro on the server.

      sdb# /etc/init.d/pbs stop


    5.1.4 Determine the NID of the node that will run the PBS Pro server and scheduler. In this example, the sdb node will be used.

        boot# ssh sdb cat /proc/cray_xt/nid

        5


    5.1.5 There may be more than one version of PBS Pro installed. Obtain the list of all currently installed PBS Pro RPMs.

        boot# xtopview -e "rpm -qa" | grep pbs


    5.1.6 Remove each installed version of PBS Pro.

        boot# xtopview -e "rpm -e pbspro-server"


      During uninstallation check that the message below appears:

      "NOTE: /etc/pbs.conf and the PBS_HOME directory must be deleted manually".


    5.1.7 Remove the /opt/pbs directory:

        boot# xtopview -e "rm -rf /opt/pbs"


  5.2. After uninstallation login to the server and MoM nodes and check that:


    5.2.1 PBS init script has been deleted

      sdb# ls -l /etc/init.d/pbs

      login# ls -l /etc/init.d/pbs


        expect to find that the pbs init script does not exist.


    5.2.2 PBS is disabled in chkconfig.

      sdb# chkconfig pbs

      login# chkconfig pbs


        expected: pbs: unknown service


  5.3. Delete /etc/pbs.conf, as root on boot node:


    Remove specialization of the existing /etc/pbs.conf file:

      boot# xtopview -e "xtunspec -N /etc/pbs.conf"

      boot# xtopview -e "xtunspec -C /etc/pbs.conf"


     and delete /etc/pbs.conf:

      boot# xtopview -e "rm /etc/pbs.conf"


  5.4. Delete PBS_HOME from the server and MoM nodes.


    sdb# rm -rf /var/spool/{pbs,PBS}

    login# rm -rf /var/spool/{pbs,PBS}


  5.5. Remove the user pbsdata from the server host (e.g. NID 5) if it still exists

       boot# xtopview -e "userdel -r pbsdata"

     Remove the home dir of pbsdata from the server host if it still exists.

       sdb# rm -rf /home/users/pbsdata



  • No labels