Analysis
- As a part of the startup process, PBS registers PostGreSQL processes for the data service with systemd so that these processes can be tracked when we stop PBS.
- This change was done as part of fixing a bug reported on systems that support systemd wherein there was a server panic reported.
- The processes were registered with systemd in pbs init script.
- However, as per the description provided in PP-368, doing it in the init script doesn't seem to be the best idea.
- Below are the changes that we need to do to fix this issue -
- Remove the below logic from the init script.
if [ $is_systemd -eq 1 ] ; then
SYSTEMD_CGROUP=`grep ^cgroup /proc/mounts | grep systemd | head -1 | cut -d ' ' -f2`
if [ ! -d $SYSTEMD_CGROUP/system.slice/pbs.service ] ; then
mkdir -p $SYSTEMD_CGROUP/system.slice/pbs.service
fi
if [ -f ${PBS_HOME}/datastore/postmaster.pid ] ; then
P_PID=`head -n 1 ${PBS_HOME}/datastore/postmaster.pid`
if [ -n "$P_PID" ] ; then
echo $P_PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks
pidlist=`pgrep -P $P_PID`
if [ -n "$pidlist" ] ; then
for PID in $pidlist; do
echo $PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks
done
fi
fi
fi
fi
- Add the below logic to pbs_dataservice script. Below code should be added after confirming that the data service is started successfully.
- Remove the below logic from the init script.
pidof systemd
ret=$?
if [ ${ret} -eq 0 ]; then
SYSTEMD_CGROUP=`grep ^cgroup /proc/mounts | grep systemd | head -1 | cut -d ' ' -f2`
if [ ! -d $SYSTEMD_CGROUP/system.slice/pbs.service ] ; then
mkdir -p $SYSTEMD_CGROUP/system.slice/pbs.service
fi
if [ -f ${PBS_HOME}/datastore/postmaster.pid ] ; then
P_PID=`head -n 1 ${PBS_HOME}/datastore/postmaster.pid`
if [ -n "$P_PID" ] ; then
echo $P_PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks
pidlist=`pgrep -P $P_PID`
if [ -n "$pidlist" ] ; then
for PID in $pidlist; do
echo $PID >> $SYSTEMD_CGROUP/system.slice/pbs.service/tasks
done
fi
fi
fi
fi
- These changes have been tested to be working.
- However, the failover scenario mentioned in the ticket, needs to be reproduced.