Revision to PBS_MOM_NODE_NAME configuration variable

Follow the PBS Pro Design Document Guidelines.

Overview

This design is a revision of PP-277: Multinode jobs may fail to start

Technical Details

PBS_MOM_NODE_NAME configuration variable

PBS_MOM_NODE_NAME is a configuration variable that may be defined in the pbs.conf configuration file. It is used to ensure that when the MoM starts up, it uses a name for the natural vnode that is consistent with the name used when creating the node on the server. The value is used when MoM builds a list of local vnodes at startup. The list consists of either the natural vnode alone, or a list of local vnodes (either configured with a v2 configuration file or with an exechost_startup or exechost_periodic hook). MoM cannot check what the value on the server because the server may not be running at the time MoM is started.

If PBS_MOM_NODE_NAME is defined in pbs.conf configuration file, then mom sets the name of the natural vnode to the value of PBS_MOM_NODE_NAME verbatim, without any checks. If PBS_MOM_NODE_NAME is not defined, MoM assumes that the name of the natural vnode is the (non-canonicalized) hostname returned by gethostname(), truncated after the first dot.

PBS_MOM_NODE_NAME also serves as a backup solution for hostname when mom fails to gethostname(). Under this use case, PBS_MOM_NODE_NAME must be defined, and must comply to RFC 952 and 1123

Log messages when MoM fails to identify its hostname

If the call to gethostname() fails and PBS_MOM_NODE_NAME is either undefined, or defined but the value does not conform to RFCs 952 and 1123, the following message will be printed to the log:

Unable to obtain my host name

Once the hostname is obtained, MoM will ensure the hostname resolves properly by calling get_fullhostname(). If the hostname fails to resolve, the following message will be printed to the log:

Unable to resolve my host name



OSS Site Map

Project Documentation Main Page

Developer Guide Pages