Expect function should validate server_state when scheduling is turned off

Description

There is a race condition where PBS runs a job even after 'scheduling' attribute has been set to False.

Scenario:

set limit of max_run for pbsuser as 1
Submit the job as pbsuser and check whether the job in R state
turn off the scheduling
submit another 10 jobs as pbsuser
turn on the scheduling
turn off the scheduling and submit one job as pbsuser2 and check the jobs state
The expected behaviour of this step would be that the initial job submitted should be in R state but it has been observed that few times the job submitted as pbsuser2 is also in R state(Despite the scheduling is turned off) .

Logs:

2-mom01:/home/zulekam # qmgr -c"s s scheduling=1"; qmgr -c"s s scheduling=0"; sudo -u pbsuser2 qsub – /bin/sleep 10000;qstat
948.pbspro-master
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
935.pbspro-master STDIN pbsuser1 00:00:00 R workq
936.pbspro-master STDIN pbsuser1 0 Q workq
937.pbspro-master STDIN pbsuser1 0 Q workq
938.pbspro-master STDIN pbsuser1 0 Q workq
939.pbspro-master STDIN pbsuser1 0 Q workq
940.pbspro-master STDIN pbsuser1 0 Q workq
941.pbspro-master STDIN pbsuser1 0 Q workq
942.pbspro-master STDIN pbsuser1 0 Q workq
943.pbspro-master STDIN pbsuser1 0 Q workq
944.pbspro-master STDIN pbsuser1 0 Q workq
945.pbspro-master STDIN pbsuser1 0 Q workq
948.pbspro-master STDIN pbsuser2 0 Q workq
pbspro-master:/home/zulekam # qstat -sw

pbspro-master:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
935.pbspro-master pbsuser1 workq STDIN 9249 1 1 – – R 00:05:49
Job run at Thu Jan 18 at 06:08 on (vnode[0]:ncpus=1)
936.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
937.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
938.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
939.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
940.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
941.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
942.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
943.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
944.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
945.pbspro-master pbsuser1 workq STDIN – 1 1 – – Q –
Not Running: Server job limit reached for user pbsuser1
948.pbspro-master pbsuser2 workq STDIN 9778 1 1 – – R 00:00:48
Job run at Thu Jan 18 at 06:13 on (vnode[0]:ncpus=1)
pbspro-master:/home/zulekam # qstat -Bf
Server: pbspro-master
server_state = Idle
server_host = pbspro-master.pbspro.com
scheduling = False
max_run = [ubsuser1=1]
total_jobs = 12
state_count = Transit:0 Queued:10 Held:0 Waiting:0 Running:2 Exiting:0 Begu
n:0
managers = pbsroot@*
default_queue = workq
log_events = 511
mail_from = adm
query_other_jobs = True
resources_default.ncpus = 1
default_chunk.ncpus = 1
resources_assigned.ncpus = 2
resources_assigned.nodect = 2
scheduler_iteration = 600
flatuid = True
FLicenses = 3322143
resv_enable = True
node_fail_requeue = 310
max_array_size = 10000
pbs_license_info = 6200@x80-lmx
pbs_license_linger_time = 3600
license_count = Avail_Global:3322141 Avail_Local:2 Used:2 High_Use:4 Avail_
Sockets:0 Unused_Sockets:0
pbs_version = 18.2.0.20180117010901
job_sort_formula = (accrue_type%2)(accrue_type%3)-1000
eligible_time_enable = True
max_concurrent_provision = 5

Sched-logs:

**01/18/2018 06:13:28;0400;pbs_sched;Node;948.pbspro-master;Allocated one subchunk: ncpus=1
01/18/2018 06:13:28;0040;pbs_sched;Job;948.pbspro-master;Job run

01/18/2018 06:13:28;0080;pbs_sched;Job;936.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;936.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;937.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;937.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;938.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;938.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;939.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;939.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;940.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;940.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;941.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;941.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;942.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;942.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;943.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;943.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;944.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;944.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Job;945.pbspro-master;Considering job to run
01/18/2018 06:13:28;0040;pbs_sched;Job;945.pbspro-master;Server job limit reached for user pbsuser1
01/18/2018 06:13:28;0080;pbs_sched;Req;;Leaving Scheduling Cycle

Server-logs:

**01/18/2018 06:13:28;0004;Server@pbspro-master;Svr;Server@pbspro-master;attributes set: scheduling = 1
01/18/2018 06:13:28;0040;Server@pbspro-master;Svr;pbspro-master;Scheduler sent command 5
01/18/2018 06:13:28;0040;Server@pbspro-master;Svr;pbspro-master;Scheduler sent command 0
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 21 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 0 request received from root@pbspro-master.pbspro.com, sock=14
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 49 request received from root@pbspro-master.pbspro.com, sock=18
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 9 request received from root@pbspro-master.pbspro.com, sock=14
01/18/2018 06:13:28;0004;Server@pbspro-master;Svr;Server@pbspro-master;attributes set: at request of root@pbspro-master.pbspro.com
01/18/2018 06:13:28;0004;Server@pbspro-master;Svr;Server@pbspro-master;attributes set: scheduling = 0
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 81 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 71 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 58 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 0 request received from pbsuser2@pbspro-master.pbspro.com, sock=14
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 49 request received from pbsuser2@pbspro-master.pbspro.com, sock=18
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 21 request received from pbsuser2@pbspro-master.pbspro.com, sock=14
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 1 request received from pbsuser2@pbspro-master.pbspro.com, sock=14
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 5 request received from pbsuser2@pbspro-master.pbspro.com, sock=14
01/18/2018 06:13:28;0100;Server@pbspro-master;Job;948.pbspro-master;enqueuing into workq, state 1 hop 1
01/18/2018 06:13:28;0008;Server@pbspro-master;Job;948.pbspro-master;Job Queued at request of pbsuser2@pbspro-master.pbspro.com, owner = pbsuser2@pbspro-master.pbspro.com, job name = STDIN, queue = workq
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 0 request received from root@pbspro-master.pbspro.com, sock=18
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 49 request received from root@pbspro-master.pbspro.com, sock=19
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 21 request received from root@pbspro-master.pbspro.com, sock=18
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 19 request received from root@pbspro-master.pbspro.com, sock=18
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 20 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 51 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0100;Server@pbspro-master;Req;;Type 23 request received from Scheduler@pbspro-master.pbspro.com, sock=15
01/18/2018 06:13:28;0008;Server@pbspro-master;Job;948.pbspro-master;Job Run at request of Scheduler@pbspro-master.pbspro.com on exec_vnode (vnode[0]:ncpus=1)
01/18/2018 06:14:20;0100;Server@pbspro-master;Req;;Type 0 request received from root@pbspro-master.pbspro.com, sock=15
01/18/2018 06:14:20;0100;Server@pbspro-master;Req;;Type 49 request received from root@pbspro-master.pbspro.com, sock=16
01/18/2018 06:14:20;0100;Server@pbspro-master;Req;;Type 21 request received from root@pbspro-master.pbspro.com, sock=15
01/18/2018 06:14:20;0100;Server@pbspro-master;Req;;Type 19 request received from root@pbspro-master.pbspro.com, sock=15
01/18/2018 06:14:54;0100;Server@pbspro-master;Req;;Type 0 request received from root@pbspro-master.pbspro.com, sock=14
01/18/2018 06:14:54;0100;Server@pbspro-master;Req;;Type 49 request received from root@pbspro-master.pbspro.com, sock=15
01/18/2018 06:14:54;0100;Server@pbspro-master;Req;;Type 21 request received from root@pbspro-master.pbspro.com, sock=14

In order to handle this scenario in PTL, expect function should always check for the "server_state" to be "Idle" after setting scheduling attribute to false.

Acceptance Criteria

None

Status

Assignee

zulekha mahalty

Reporter

zulekha mahalty

Severity

2-Medium

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Fix versions

Affects versions

18.1.1

Priority

Medium
Configure