|
This tool is meant to replace the 'pbs_diag' script which is currently the means to capture data from PBS for diagnostics.
"pbs_snapshot" will be written in Python and will make use of PTL libraries to interact with the PBS system that it is capturing. This will mean that any major changes to PBS will need very minor (if any) refactoring of pbs_snapshot as PTL gets updated in tandem with PBS now, so pbs_snapshot will automatically work with the latest version of PBSPro.
Also, a new set of utilities (PBSSnapUtils) will be added to PTL itself for this tool, which will be directly available for PTL test writers and developers to write PTL tests/debugging tools which may need the ability to take snapshots of PBS.
The first version of the tool will also come with the ability to anonymize/obfuscate PBS data to enable users with sensitive data to obfuscate and share snapshots for bug reporting and debugging.
A 'snapshot', which will be the output produced by the pbs_snapshot tool, will be a tarball (.tgz file) containing the following directory structure & files:
mom_priv/
Copies of the following files: 'config', 'prologue', 'epilogue', 'mom.lock'
config.d/: contains copy of all vnode def files from inside PBS_HOME/mom_priv/config.d/
Name: <resource name>
type = <resource type attribute>
flag = <resource flag attribute>
Name: <resource name>
type = <resource type attribute>
flag = <resource flag attribute>
...
...
datastore/
pg_log/: a copy of the "PBS_HOME/datastore/pg_log" directory
pbs/
pbs.conf: a copy of the pbs.conf file for the PBS system
pbs_probe_v.out: output of "pbs_probe -v"
core_file_bt/ (stack backtrace from core files)
sched_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/sched_priv
The interface for pbs_snapshot will be as follows:
sudo pbs_snapshot -o <path to output tar file> [OPTION]
-d <pbs_diag>: diag directory to use as input
-H <hostname>: hostname to operate on. Defaults to the value of PBS_SERVER
-l <loglevel>: set log level to one of INFO, INFOCLI, INFOCLI2, DEBUG, DEBUG2,
WARNING, ERROR, FATAL
--service-logs=<num days> number of days of service logs to collect
--accounting-logs=<num days> number of days of accounting logs to collect
--additional_hosts=<hostname>: capture additional logs from the hosts specified
'hostname' is a comma separated list of hosts to take logs from
--map=<file>: path to filename to store the mapping of obfuscated data
--obfuscate: obfuscates euser, egroup, project, account_name, hostnames,
IP Addressses, PBS dataservice username
Deletes mail endpoints, owner, managers, operators, variable_list
ACLs, group_list, job name, jobdir
--version: print version number and exit
sudo - Currently pbs_snapshot will need to be run as a user with sudo privileges because it needs to access protected PBS information (e.g - information inside the PBS_HOME/ _priv directories)
Interface: Input -o <path to output tar file>
Interface: Option -d <pbs_diag>
Interface: Option -H <hostname>
Interface: Option -l <loglevel>
Interface: Option --service-logs=<num days>
Interface: Option --accounting-logs=<num days>
Interface: Option --additional_hosts=<hostname>
Interface: Option --map=<file>
Interface: Option --obfuscate
Interface: Option --version