Uploaded image for project: 'PBS Pro'
  1. PP-1307

Test "test_sister_mom_crash" of TestSisterMom Fails as it is not able to find pbsdsh path while submitting the job

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Low
    • Resolution: Done
    • Affects versions: None
    • Fix versions: None
    • Components: PTL Tests
    • Labels:
      None
    • Sprint:
    • Story Points:
      1

      Description

      Tests "TestSisterMom" needs to be updated to include path of pbsdsh command
      Currently the test is failing with below error
      Error:

      2018-08-14 05:26:03,341 INFO submit to x90-u16 as pbsuser: job 20.x90-u16 OrderedDict([('Resource_List.place', 'scatter'), ('Resource_List.select', '2')])
      2018-08-14 05:26:03,342 INFOCLI job script /tmp/PtlPbsJobScriptB5EYV3

      pbsdsh dd if=/dev/zero of=/dev/null

      2018-08-14 05:26:03,343 INFOCLI x90-u16: /opt/pbs/bin/qstat -f 20.x90-u16
      2018-08-14 05:26:03,386 INFO expect on server x90-u16: job_state = R && substate = 42 job 20.x90-u16 got: job_state = Q
      2018-08-14 05:26:03,888 INFOCLI x90-u16: /opt/pbs/bin/qmgr -c set server scheduling=True
      2018-08-14 05:26:03,937 INFOCLI x90-u16: /opt/pbs/bin/qstat -f 20.x90-u16
      2018-08-14 05:26:04,493 INFO expect on server x90-u16: no data for job_state = R && substate = 42 job 20.x90-u16 attempt: 2
      2018-08-14 05:26:04,494 INFOCLI x90-u16: /opt/pbs/bin/qstat -f 20.x90-u16

      Tracejob :

      pbsroot@x90-u16:~/TEST/tmp/PBSPro_18.2.2/tests/functional$ tracejob 20

      Job: 20.x90-u16

      08/14/2018 05:26:03 L Considering job to run
      08/14/2018 05:26:03 S Job Queued at request of pbsuser@x90-u16.pbspro.com, owner = pbsuser@x90-u16.pbspro.com, job name = PtlPbsJobScriptB5EYV3, queue = workq
      08/14/2018 05:26:03 S Job Run at request of Scheduler@x90-u16.pbspro.com on exec_vnode (x90-u16:ncpus=1)+(x91-u16:ncpus=1)
      08/14/2018 05:26:03 S Obit received momhop:1 serverhop:1 state:4 substate:42
      08/14/2018 05:26:03 S Exit_status=127 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=712kb resources_used.ncpus=2 resources_used.vmem=6532kb resources_used.walltime=00:00:00
      08/14/2018 05:26:03 L Job run
      08/14/2018 05:26:03 M Type 5 request received from root@10.8.6.203:15001, sock=2
      08/14/2018 05:26:03 M Started, pid = 26140
      08/14/2018 05:26:03 M task 00000001 terminated
      08/14/2018 05:26:03 M Terminated
      08/14/2018 05:26:03 M task 00000001 cput= 0:00:00
      08/14/2018 05:26:03 M kill_job
      08/14/2018 05:26:03 M x90-u16 cput= 0:00:00 mem=712kb
      08/14/2018 05:26:03 M x91-u16.pbspro.com cput= 0:00:00 mem=0kb
      08/14/2018 05:26:03 M no active tasks
      08/14/2018 05:26:03 M Obit sent
      08/14/2018 05:26:03 M copy file request received
      08/14/2018 05:26:03 M staged 2 items out over 0:00:00
      08/14/2018 05:26:03 M no active tasks
      08/14/2018 05:26:03 M delete job request received
      08/14/2018 05:26:03 S enqueuing into workq, state 1 hop 1
      08/14/2018 05:26:03 S dequeuing from workq, state 5
      08/14/2018 05:26:03 M kill_job

      Analysis:

      In the test we are submitting job as pbsuser.
      The job script used in submission of the job is below
      pbsdsh dd if=/dev/zero of=/dev/null
      I tried running the test manually and below are the details
      Log-snippet:

      Job script will be read from standard input. Submit with CTRL+D.
      pbsdsh dd if=/dev/zero of=/dev/null
      17.x90-u16
      pbsuser@x90-u16:~$ qstat
      Job id Name User Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      16.x90-u16 PtlPbsJobScript pbsroot 00:07:50 R workq
      17.x90-u16 STDIN pbsuser 0 Q workq
      pbsuser@x90-u16:~$ qdel 16
      qdel: Unauthorized Request 16.x90-u16
      pbsuser@x90-u16:~$ qstat -sw
      pbsuser@x90-u16:~$ tracejob 17

      Job: 17.x90-u16

      08/14/2018 04:08:57 L Considering job to run
      08/14/2018 04:08:57 S enqueuing into workq, state 1 hop 1
      08/14/2018 04:08:57 S Job Queued at request of pbsuser@x90-u16.pbspro.com, owner = pbsuser@x90-u16.pbspro.com, job name = STDIN, queue = workq
      08/14/2018 04:08:57 S Job Modified at request of Scheduler@x90-u16.pbspro.com
      08/14/2018 04:08:57 L Not enough free nodes available
      08/14/2018 04:09:31 L Considering job to run
      08/14/2018 04:09:31 S Job Run at request of Scheduler@x90-u16.pbspro.com on exec_vnode (x90-u16:ncpus=1)+(x91-u16:ncpus=1)
      08/14/2018 04:09:31 S Exit_status=127 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=728kb resources_used.ncpus=2 resources_used.vmem=6532kb resources_used.walltime=00:00:00
      08/14/2018 04:09:31 L Job run
      08/14/2018 04:09:31 S Obit received momhop:1 serverhop:1 state:4 substate:42
      08/14/2018 04:09:31 M Type 5 request received from root@10.8.6.203:15001, sock=2
      08/14/2018 04:09:31 M Started, pid = 18624
      08/14/2018 04:09:31 M task 00000001 terminated
      08/14/2018 04:09:31 M Terminated
      08/14/2018 04:09:31 M task 00000001 cput= 0:00:00
      08/14/2018 04:09:31 M kill_job
      08/14/2018 04:09:31 M x90-u16 cput= 0:00:00 mem=728kb
      08/14/2018 04:09:31 M x91-u16.pbspro.com cput= 0:00:00 mem=0kb
      08/14/2018 04:09:31 M no active tasks
      08/14/2018 04:09:31 M Obit sent
      08/14/2018 04:09:31 M copy file request received
      08/14/2018 04:09:31 M staged 2 items out over 0:00:00
      08/14/2018 04:09:31 M no active tasks
      08/14/2018 04:09:31 M delete job request received
      08/14/2018 04:09:31 S dequeuing from workq, state 5
      08/14/2018 04:09:31 M kill_job
      pbsuser@x90-u16:~$ cat STDIN.e17
      /var/spool/pbs/mom_priv/jobs/17.x90-u16.SC: 1: /var/spool/pbs/mom_priv/jobs/17.x90-u16.SC: pbsdsh: not found

      Solution Description:
      We need to include path of pbsdsh while submitting job.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Mahalty zulekha mahalty
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: