PTL should cleanup job folders

Description

job temporary directories were left uncleaned for some jobs. Looking at the logs it happens when job is in substate41 and qdel -W force is issued. This failure is intermittently seen on few test systems.

drwx------. 2 pbsuser tstgrp00 4096 Nov 21 14:01 pbs.5.x53-c6p6

from mom_logs:
[root@x53-c6p6 tmp]# grep 5.x53 /var/spool/pbs/mom_logs/20171121
11/21/2017 14:01:03;0080;pbs_mom;Job;5.x53-c6p6;signal job request received
11/21/2017 14:01:03;0004;pbs_mom;Job;5.x53-c6p6;signal job with SIGKILL
11/21/2017 14:01:03;0008;pbs_mom;Job;5.x53-c6p6;kill_job
11/21/2017 14:01:03;0001;pbs_mom;Job;5.x53-c6p6;Job recycled into exiting on signal from substate 41
11/21/2017 14:01:03;0001;pbs_mom;Job;5.x53-c6p6;Job discarded at request of Server
11/21/2017 14:01:03;0008;pbs_mom;Job;5.x53-c6p6;kill_job

[root@x53-c6p6 tmp]# grep 5.x53 /var/spool/pbs/server_logs/20171121
11/21/2017 14:00:53;0100;Server@x53-c6p6;Job;5.x53-c6p6;enqueuing into R2, state 1 hop 1
11/21/2017 14:00:53;0008;Server@x53-c6p6;Job;5.x53-c6p6;Job Queued at request of pbsuser@x53-c6p6.pbspro.com, owner = pbsuser@x53-c6p6.pbspro.com, job name = STDIN, queue = R2
11/21/2017 14:00:53;0008;Server@x53-c6p6;Job;5.x53-c6p6;Job Modified at request of Scheduler@x53-c6p6.pbspro.com
11/21/2017 14:01:02;0008;Server@x53-c6p6;Job;5.x53-c6p6;Job Modified at request of Scheduler@x53-c6p6.pbspro.com
11/21/2017 14:01:03;0008;Server@x53-c6p6;Job;5.x53-c6p6;Job Run at request of Scheduler@x53-c6p6.pbspro.com on exec_vnode (x53-c6p6:ncpus=1)
11/21/2017 14:01:03;0080;Server@x53-c6p6;Job;5.x53-c6p6;delete job request received
11/21/2017 14:01:03;0008;Server@x53-c6p6;Job;5.x53-c6p6;Delete forced
11/21/2017 14:01:03;0008;Server@x53-c6p6;Job;5.x53-c6p6;Job to be deleted at request of pbsroot@x53-c6p6.pbspro.com
11/21/2017 14:01:03;0008;Server@x53-c6p6;Job;5.x53-c6p6;Discard running job, Forced Delete
11/21/2017 14:01:03;0100;Server@x53-c6p6;Job;5.x53-c6p6;dequeuing from R2, state 5

[root@x53-c6p6 tmp]# grep 5.x53 /var/spool/pbs/sched_logs/20171121
11/21/2017 14:00:53;0040;pbs_sched;Job;5.x53-c6p6;Queue not started
11/21/2017 14:00:53;0040;pbs_sched;Job;5.x53-c6p6;Queue not started
11/21/2017 14:01:02;0080;pbs_sched;Job;5.x53-c6p6;Considering job to run
11/21/2017 14:01:02;0040;pbs_sched;Job;5.x53-c6p6;Insufficient amount of queue resource: ncpus (R: 1 A: 0 T: 1)
11/21/2017 14:01:02;0080;pbs_sched;Job;5.x53-c6p6;Considering job to run
11/21/2017 14:01:02;0040;pbs_sched;Job;5.x53-c6p6;Insufficient amount of queue resource: ncpus (R: 1 A: 0 T: 1)
11/21/2017 14:01:03;0080;pbs_sched;Job;5.x53-c6p6;Considering job to run
11/21/2017 14:01:03;0040;pbs_sched;Job;5.x53-c6p6;Job run

[root@x53-c6p6 tmp]# tracejob 5

Job: 5.x53-c6p6

11/21/2017 14:00:53 S Job Queued at request of pbsuser@x53-c6p6.pbspro.com, owner = pbsuser@x53-c6p6.pbspro.com, job name = STDIN, queue = R2
11/21/2017 14:00:53 S Job Modified at request of Scheduler@x53-c6p6.pbspro.com
11/21/2017 14:00:53 L Queue not started
11/21/2017 14:00:53 L Queue not started
11/21/2017 14:00:53 S enqueuing into R2, state 1 hop 1
11/21/2017 14:00:53 A queue=R2
11/21/2017 14:01:02 L Considering job to run
11/21/2017 14:01:02 L Insufficient amount of queue resource: ncpus (R: 1 A: 0 T: 1)
11/21/2017 14:01:02 L Considering job to run
11/21/2017 14:01:02 L Insufficient amount of queue resource: ncpus (R: 1 A: 0 T: 1)
11/21/2017 14:01:02 S Job Modified at request of Scheduler@x53-c6p6.pbspro.com
11/21/2017 14:01:03 L Considering job to run
11/21/2017 14:01:03 S Job Run at request of Scheduler@x53-c6p6.pbspro.com on exec_vnode (x53-c6p6:ncpus=1)
11/21/2017 14:01:03 S Delete forced
11/21/2017 14:01:03 S Discard running job, Forced Delete
11/21/2017 14:01:03 M Job recycled into exiting on signal from substate 41
11/21/2017 14:01:03 M Job discarded at request of Server
11/21/2017 14:01:03 L Job run
11/21/2017 14:01:03 S delete job request received
11/21/2017 14:01:03 S Job to be deleted at request of pbsroot@x53-c6p6.pbspro.com
11/21/2017 14:01:03 S dequeuing from R2, state 5
11/21/2017 14:01:03 M kill_job
11/21/2017 14:01:03 M kill_job
11/21/2017 14:01:03 A requestor=pbsroot@x53-c6p6.pbspro.com
11/21/2017 14:01:03 M signal job request received
11/21/2017 14:01:03 M signal job with SIGKILL

Acceptance Criteria

None

Status

Assignee

Kumar Jakkali

Reporter

anamika upadhyay

Severity

3-High

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Affects versions

18.1.0

Priority

High
Configure