Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page Properties


Target release17.1.1
Epic

Jira Legacy
serverJIRA (pbspro.atlassian.net)
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId32008a99-7831-3ff8-9638-3db0cd01164d
keyPP-758

Document status

Status
colourGreen
titlecomplete

Forum Discussion/Reviewhttp://community.pbspro.org/t/pp-758-add-pbs-snapshot-tool-to-capture-state-logs-from-pbs/520/22
Document owner
Designer
Developers
QA


...

A 'snapshot', which will be the output produced by the pbs_snapshot tool, will be a tarball (.tgz file) named as "snapshot_<timestamp>.tgz" containing the following directory structure & files:

  • server/accounting/: contains accounting logs from PBS_HOME/server_priv/accounting/ directory
    • qstat_B.out: output of "qstat -B"
    • qstat_Bf.out: output of "qstat -Bf"
    • qmgr_ps.out: output of "qmgr print server"
    • qstat_Q.out: output of "qstat -Q"
    • qstat_Qf.out: output of "qstat -Qf"
    • qmgr_pr.out: output of "qmgr print resource"rscs_all (derived from the resourcedef file): Will list out built-in as well as custom resources in the following format:

          Name: <resource name>
               type = <resource type attribute>
               flag = <resource flag attribute>

          Name: <resource name>
               type = <resource type attribute>
               flag = <resource flag attribute>

          ...
          ...
    server_priv/: a copy of the 'server_priv' directory inside PBS_HOME, core files are captured separately (see core_file_bt/)
    • qmgr_pq.out: output of "qmgr print queue @default"
  • server_priv/: a copy of the 'server_priv' directory inside PBS_HOME, core files are captured separately (see core_file_bt/)
    • accounting/: contains accounting logs from PBS_HOME/server_priv/accounting/ directory for the number of days specified by --accounting-logs option
  • server_logs/ : contains server logs from the PBS_HOME/server_logs directory for the number of days specified by --daemon-logs option
  • job/
    • qstat.out: output of "qstat"
    • qstat_f.out: output of "qstat -f"
    • qstat_t.out: output of "qstat -t"
    • qstat_tf.out: output of "qstat -tf"
    • qstat_x.out: output of "qstat -x"
    • qstat_xf.out: output of "qstat -xf"
    • qstat_ns.out: output of "qstat -ns"
    • qstat_fx_F_dsv.out: output of "qstat -fx -F dsv"
    • qstat_f_F_dsv.out: output of "qstat -f -F dsv"
    node/
    • pbsnodes_vaqstat_f_F_json.out:  output of "pbsnodes qstat -f -F json"
  • node/
    • pbsnodes_va.out: output of "pbsnodes -va"
    • pbsnodes_a.out: output of "pbsnodes -a"
    • pbsnodes_avSj.out: output of "pbsnodes -avSj"
    • pbsnodes_aSj.out: output of "pbsnodes -aSj"
    • pbsnodes_avS.out: output of "pbsnodes -avS"
    • pbsnodes_aS.out: output of "pbsnodes -aS"
    • pbsnodes_aFdsv.out: output of "pbsnodes -aFdsv"
    • pbsnodes_avFdsv.out: output of "pbsnodes -avFdsv"
    • pbsnodes_avFjson.out: output of "pbsnodes -avFjson"
    • qmgr_pn_default.out: output of "qmgr print node @default"

  • mom_priv/: a copy of the 'mom_priv' directory inside PBS_HOME, core files are captured separately (see core_file_bt/)

  • mom_logs/: contains mom logs from the PBS_HOME/mom_logs directory for the number of days specified by --daemon-logs option
  • comm_logs/: contains comm logs from the PBS_HOME/comm_logs directory for the number of days specified by --daemon-logs option
  • sched_priv/: a copy of the 'sched_priv' directory inside PBS_HOME with all the files, core files are not captured (see core_file_bt/).
  • sched_logs/: contains scheduler logs from the PBS_HOME/sched_logs directory for the number of days specified by --daemon-logs option
  • reservation/
    • pbs_rstat_f.out: output of "pbs_rstat -f"
    • pbs_rstat.out: output of "pbs_rstat"
  • scheduler/
    • qmgr_lsched.out: output of "qmgr list sched"
    • qmgr_psched.out: output of "qmgr print sched"
  • hook/

    • qmgr_ph_default.out: output of "qmgr print hook @default"
    • qmgr_lpbshook.out: output of "qmgr list pbshook"

  • datastore/

    • pg_log/: a copy of the "PBS_HOME/datastore/pg_log" directory for the number of days specified by --daemon-logs option


  • core_file_bt/ (stack backtrace from core files)

    • sched_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/sched_priv

    • server_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/server_priv
    • mom_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/mom_priv
    • misc/: files containing the output of "thread apply all backtrace full" on any other core files found inside PBS_HOME
  • system/
    • pbs_probe_v.out: output of "pbs_probe -v"
    • pbs_hostn_v.out: output of "pbs_hostn -v $(hostname)"
    • pbs_environment: copy of PBS_HOME/pbs_environment file
    • os_info: Information about the OS
    • process_info: List of processes running on the system when the snapshot was taken (output of "ps -aux | grep [p]bs" on linux systems and "tasklist /v" on windows systems)
    • ps_leaf.out: output of ps -leaf, only on linux systems
    • lsof_pbs.out: output of "lsof | grep [p]bs", only on linux systems
    • etc_hosts: Copy of "/etc/hosts" file, only on linux systems.
    • etc_nsswitch_conf: Copy of "/etc/nsswitch.conf" file, only on linux systems.
    • vmstat.out: Output of the command 'vmstat', only on linux systems.
    • df_h.out: Output of the command 'df -h', only on linux systems.
    • dmesg.out: Output of the 'dmesg' command, only on linux systems.
  • pbs.conf: a copy of the pbs.conf file for the PBS system
  • ctime: this will log the time (since epoch) when the snapshot was taken.
  • pbs_snapshot.log: captures the logs generated by pbs_snapshot.

...

The interface for pbs_snapshot will be as follows :(output of pbs_snapshot -o <path to target directory> [OPTION]

-H <hostname>: hostname to operate on

-l <loglevel>: set the log level

--daemon-logs=<num days> number of days of daemon logs to collect

--accounting-logs=<num days> number of days of accounting logs to collect

--additional-hosts=<hostname>: capture additional information from the hosts specified

--map=<file>: path to filename to store the mapping of obfuscated data

--obfuscate: obfuscates sensitive data

...

-help):

Code Block
Usage: pbs_snapshot -o <path to existing output directory> [OPTION]

    Take snapshot of a PBS system and optionally capture logs for diagnostics

    -H <hostname>                     primary hostname to operate on
                                      Defaults to local host
    -l <loglevel>                     set log level to one of INFO, INFOCLI,
                                      INFOCLI2, DEBUG, DEBUG2, WARNING, ERROR
                                      or FATAL
    -h, --help                        display this usage message
    --daemon-logs=<num days>          number of daemon logs to collect
    --accounting-logs=<num days>      number of accounting logs to collect
    --additional-hosts=<hostname>     collect data from additional hosts
                                      'hostname' is a comma separated list
    --map=<file>                      file to store the map of obfuscated data
    --obfuscate                       obfuscates sensitive data
    --with-sudo                       Uses sudo to capture privileged data
    --version                         print version number and exit



Caveat - Currently pbs_snapshot will need to be run as root because it needs to access protected PBS information (e.g - information inside the PBS_HOME/ _priv directories). So, it could either be run with sudo, or as root user. If it is run with restricted privileges, it won't be able to query all of the data.

...

  • Synopsis: Option to obfuscate/anonymize the PBS data captured
  • Details:
    • This option will instruct pbs_snapshot to obfuscate euser, egroup, project, Account_Name, operators, managers, group_listMail_Users, User_List, server_host, acl_groups, acl_users, acl_resv_groups, acl_resv_users, sched_host, acl_resv_hosts, acl_hosts, Job_Owner, exec_hostHost, Mom, resources_available.host and resources_available.vnode.
    • It will also delete Variable_List, Error_Path, Output_Path, mail_fromMail_Points, Job_Name, jobdir, Submit_arguments,  Shell_Path_List.

...