pbs_mpirun failing in the ssh environment for MPICH-P4 integration

Description

I was executing the pbs_mpirun command 'pbs_mpirun -np 2 -machinefile machfile /tmp/mpihw' in the ssh environment.
This command failed giving the following error:
[user1@machine1~]$ pbs_mpirun -np 2 -machinefile machfile /tmp/mpihw
pbs_mpirun: Warning, not running under PBS
pbs_attach: tm_attach: no matching job found (17006)
p0_13650: p4_error: Child process exited while making connection to remote process on machine2: 0
p0_13650: (2.144531) net_send: could not write to fd=4, errno = 32

The P4_RSHCOMMAND env evariable was set in the user's .bashrc file as per the recommendation.
[user1@machine1~]$ cat .bashrc

  1. .bashrc

  1. Source global definitions
    if [ -f /etc/bashrc ]; then
    . /etc/bashrc
    fi

  1. User specific aliases and functions
    export PATH=/usr/local/mpich-1.2.7/bin/:/opt/pbs/default/bin:$PATH
    export P4_RSHCOMMAND=ssh

Similar command passed when mpirun was invoked. See below:
[user1@machine1~]$ mpirun -np 2 -machinefile machfile /tmp/mpihw
hithere machine1
hithere machine2
[user1@machine1~]$ cat machfile
machine1
machine2

On further investigation I found that the pbs_mpirun script has following environment P4_RSHCOMMAND exported,
which is hard coded:
bash-4.1# cat /opt/pbs/default/bin/pbs_mpirun | grep RSHCOMMAND
export PBS_RSHCOMMAND=${P4_RSHCOMMAND:-rsh}
export P4_RSHCOMMAND=${PBS_EXEC}/bin/pbs_remsh

When I commented out the line 'export P4_RSHCOMMAND=${PBS_EXEC}/bin/pbs_remsh'.
And then executed the same the command worked fine:
bash-4.1# cat /opt/pbs/default/bin/pbs_mpirun | grep RSHCOMMAND
export PBS_RSHCOMMAND=${P4_RSHCOMMAND:-rsh}
#export P4_RSHCOMMAND=${PBS_EXEC}/bin/pbs_remsh
[user1@machine1~]$ pbs_mpirun -np 2 -machinefile machfile /tmp/mpihw
pbs_mpirun: Warning, not running under PBS
hithere machine1
hithere machine2

Acceptance Criteria

None

Status

Assignee

Unassigned

Reporter

Former user

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Components

Priority

Critical
Configure