All issues

pbs_mom dumped core in tpp_em_destroy
PP-1272
Process > 3 rpp/tpp messages (from moms) per server iteration
PP-1250
Update PTL fw to keep data consistent between PTL and PBS
PP-1108
PTL test install command doesn't work
PP-939
PTL should use same version as PBS
PP-785
Extend PBS to support a list of scheduler objects
PP-746
Revert changes done as part of PP-664
PP-666
ACL and Array String attributes use memcpy for overlapping memory blocks potentially causing server hang
PP-421
PBS Server is restarting Python interpreter too often
PP-228
Flatten github/PBSPro/pbspro/pbs to github/PBSPro/pbspro, remove redundant files and directories
PP-226
subjobs are sometimes aborted on server restart
PP-1333
Support for Python 3 in Windows
PP-1321
Support for Python 3 in Linux
PP-1320
incompatible types when assigning to type 'struct icaltimetype' from type 'int'
PP-1318
Appveyor build failing more frequently
PP-1315
PBS adds extra chars to output of command run via pbs_attach -c on Windows
PP-1300
exechost_periodic run before exechost_startup on Windows
PP-1299
Mom on windows not running exechost_startup hook every time it restarts
PP-1298
New decorator in PTL using which user can provide cluster information required for a test.
PP-1281
Add deepcopy support in pbs types in hook
PP-1266
Deleted subjobs get requeued after server restart or failover
PP-1259
Guidelines for design documents
PP-1255
Address all issues that Codacy flags as errors
PP-1229
Move test_resource_create and test_resource_delete out of smoke test
PP-1226
PTL doesn't revert the default mom's configuration file in setup
PP-1225
PBS Scheduler can crash if query_queue_info returns NULL
PP-1221
Remove obsolete code from PBS Pro
PP-1206
Function "set_sched_config" doesn't throw error when validation of sched_config fails
PP-1205
PTL should report failure if there are job folders left in TMPDIR
PP-1204
mom not cleaning up job folders
PP-1203
Miscelleneous fixes on TPP usage in net_server.c
PP-1187
pbs_stf.py test_t_4_3_6 needs corrections.
PP-1183
remove sudo from run_cmd for TestQstat_oneline_json_dsv
PP-1177
increase interval for log_match in TestAdminSuspend.test_hook
PP-1173
Smoketest.test_resource_delete fails intermittently
PP-1163
update pbs_snapshot path in TestPBSSnapshot
PP-1162
update TestQstat_json.test_qstat_bf_json_valid
PP-1161
TestQmgr fails saying "No such file or directory"
PP-1158
Test "test_two_moms_single_vnode_pool" in pbs_cray_vnode_pool.py fails due to a log_match error on MoM's FQDN
PP-1155
race condition in log match for TestPbsHookAlarmLargeMultinodeJob.test_epi_hook
PP-1153
update the interval for log check in TestJobEquivClassPerf.test_basic
PP-1152
TestSoftWalltime.test_restart_server failing with race conditions
PP-1151
increase the dedicated time for TestSoftWalltime.test_soft_extend_dedicated
PP-1150
set_sched_config not editing sched_config correctly
PP-1147
test failures in TestAdminSuspend
PP-1146
increase time out for TestHookDebugNoCrash.test_hook_debug_no_crash
PP-1143
PBS Server can potentially crash in some or all platforms due to linker not getting suitable return types
PP-1139
TPP memory access error in epoll_wait()
PP-1138
Windows installer code clean up
PP-1133
Add Cray specific PTL smoke tests
PP-1130
issue 1 of 1197

pbs_mom dumped core in tpp_em_destroy

Description

The corefile was produced with a mainline build from April 20th. Not sure what triggered it, but found a corefile in mom_priv. Corefile was produced on a CentOS 7 system. RPMs and core file attached.

(gdb) where
#0 0x00007f950dcd11f7 in raise () from /lib64/libc.so.6
#1 0x00007f950dcd28e8 in abort () from /lib64/libc.so.6
#2 0x00007f950dd10f47 in __libc_message () from /lib64/libc.so.6
#3 0x00007f950dd18619 in _int_free () from /lib64/libc.so.6
#4 0x00000000004828f4 in tpp_em_destroy (em_ctx=0x25ffd30)
at ../../../../src/lib/Libtpp/tpp_em.c:180
#5 0x00000000004854f6 in tpp_transport_terminate ()
at ../../../../src/lib/Libtpp/tpp_transport.c:2428
#6 0x000000000047db6c in tpp_terminate () at ../../../../src/lib/Libtpp/tpp_client.c:1591
#7 0x00007f950dd5b3bf in fork () from /lib64/libc.so.6
#8 0x0000000000451ec3 in rmtmpdir (jobid=jobid@entry=0x266c998 "273.swdev")
at ../../../src/resmom/start_exec.c:1048
#9 0x0000000000420420 in job_purge (pjob=0x266c810) at ../../../src/server/job_func.c:823
#10 0x000000000041df25 in main (argc=1, argv=<optimized out>)
at ../../../src/resmom/mom_main.c:9982

Acceptance Criteria

None

Status

Assignee

Václav Chlumský

Reporter

Michael Karo

Severity

None

OS

None

Start Date

None

Pull Request URL

Story Points

1

Components

Fix versions

Affects versions

18.1.0

Priority

Highest