Analysis

  • As a part of the startup process, PBS registers PostGreSQL processes for the data service with systemd so that these processes can be tracked when we stop PBS.
  • This change was done as part of fixing a bug reported on systems that support systemd wherein there was a server panic reported.
  • The processes were registered with systemd in pbs init script.
  • However, as per the description provided in PP-368, doing it in the init script doesn't seem to be the best idea.
  • Below are the changes that we need to do to fix this issue -
    • Remove the below logic from the init script.

              if [ $is_systemd -eq 1 ] ; then

                   SYSTEMD_CGROUP=`grep ^cgroup /proc/mounts | grep systemd | head -1 | cut -d ' ' -f2`

                   if [ ! -d $SYSTEMD_CGROUP/system.slice/pbs.service ] ; then

                     mkdir -p $SYSTEMD_CGROUP/system.slice/pbs.service

                   fi

                 if [ -f ${PBS_HOME}/datastore/postmaster.pid ] ; then

                   P_PID=`head -n 1 ${PBS_HOME}/datastore/postmaster.pid`

                  if [ -n "$P_PID" ] ; then

                     echo $P_PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks

                     pidlist=`pgrep -P $P_PID`

                     if [ -n "$pidlist" ] ; then

                       for PID in $pidlist; do

                         echo $PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks

                       done

                    fi

                   fi

                 fi

               fi

    • Add the below logic to pbs_dataservice script. Below code should be added after confirming that the data service is started successfully.

                        pidof systemd

                        ret=$?

                        if [ ${ret} -eq 0 ]; then

                                SYSTEMD_CGROUP=`grep ^cgroup /proc/mounts | grep systemd | head -1 | cut -d ' ' -f2`

                                if [ ! -d $SYSTEMD_CGROUP/system.slice/pbs.service ] ; then

                                        mkdir -p $SYSTEMD_CGROUP/system.slice/pbs.service

                                fi

                                if [ -f ${PBS_HOME}/datastore/postmaster.pid ] ; then

                                        P_PID=`head -n 1 ${PBS_HOME}/datastore/postmaster.pid`

                                        if [ -n "$P_PID" ] ; then

                                                echo $P_PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks

                                                pidlist=`pgrep -P $P_PID`

                                                if [ -n "$pidlist" ] ; then

                                                        for PID in $pidlist; do

                                                                echo $PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks

                                                        done

                                                fi

                                        fi

                                fi

                        fi

  • These changes have been tested to be working.
  • However, the failover scenario mentioned in the ticket, needs to be reproduced.