Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page Properties


Target release17.1.1
Epic

Jira Legacy
serverJIRA (pbspro.atlassian.net)
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId32008a99-7831-3ff8-9638-3db0cd01164d
keyPP-758

Document status
Status
titleDRAFT
Forum Discussion/Reviewhttp://community.pbspro.org/t/pp-758-add-pbs-snapshot-tool-to-capture-state-logs-from-pbs/520/22
Document owner
Designer
Developers
QA


...

  • server/
    • qstat_B.out: output of "qstat -B"
    • qstat_Bf.out: output of "qstat -Bf"
    • qmgr_ps.out: output of "qmgr print server"
    • server_priv/: a copy of the 'server_priv' directory inside PBS_HOME, may or may not include accounting logs (see the -L option under "Interface Documentation"), core files are not captured (see core_file_bt/).
    • server_logs/ : contains server logs from the PBS_HOME/server_logs directory for the number of days specified by -L option
  • job/
    • qstat.out: output of "qstat"
    • qstat_f.out: output of "qstat -f"
    • qstat_t.out: output of "qstat -t"
    • qstat_tf.out: output of "qstat -tf"
    • qstat_x.out: output of "qstat -x"
    • qstat_xf.out: output of "qstat -xf"
    • qstat_ns.out: output of "qstat -ns"
    • qstat_fx_F_dsv.out: output of "qstat -fx -F dsv"
    • qstat_f_F_dsv.out: output of "qstat -f -F dsv"
  • node/
    • pbsnodes_va.out: output of "pbsnodes -va"
    • pbsnodes_a.out: output of "pbsnodes -a"
    • pbsnodes_avSj.out: output of "pbsnodes -avSj"
    • pbsnodes_aSj.out: output of "pbsnodes -aSj"
    • pbsnodes_avS.out: output of "pbsnodes -avS"
    • pbsnodes_aS.out: output of "pbsnodes -aS"
    • pbsnodes_aFdsv.out: output of "pbsnodes -aFdsv"
    • pbsnodes_avFdsv.out: output of "pbsnodes -avFdsv"
    • qmgr_pn_default.out: output of "qmgr print node @default"
    • mom_priv/

      • Copies of the following files: 'config', 'prologue', 'epilogue', 'mom.lock'

      • config.d/: contains copy of all vnode def files from inside PBS_HOME/mom_priv/config.d/

    • mom_logs/: contains mom logs from the PBS_HOME/mom_logs directory for the number of days specified by -L option
  • comm/
    • comm_logs/: contains comm logs from the PBS_HOME/comm_logs directory for the number of days specified by -L option
  • queue/
    • qstat_Q.out: output of "qstat -Q"
    • qstat_Qf.out: output of "qstat -Qf"
  • hook/
    • qmgr_ph_default.out: output of "qmgr print hook @default"
    • qmgr_lpbshook.out: output of "qmgr list pbshook"
  • scheduler/
    • qmgr_lsched.out: output of "qmgr list sched"
    • sched_priv/: a copy of the 'sched_priv' directory inside PBS_HOME with all the files, core files are not captured (see core_file_bt/).
    • sched_logs/: contains scheduler logs from the PBS_HOME/sched_logs directory for the number of days specified by -L option
  • reservation/
    • pbs_rstat_f.out: output of "pbs_rstat -f"
    • pbs_rstat.out: output of "pbs_rstat"
  • resource/
    • qmgr_pr.out: output of "qmgr print resource"
    • rscs_all (derived from the resourcedef file): Will list out built-in as well as custom resources in the following format:

          Name: <resource name>
               type = <resource type attribute>
               flag = <resource flag attribute>

          Name: <resource name>
               type = <resource type attribute>
               flag = <resource flag attribute>

          ...
          ...

  • datastore/

    • pg_log/: a copy of the "PBS_HOME/datastore/pg_log" directory

  • pbs/

    • pbs.conf: a copy of the pbs.conf file for the PBS system

    • pbs_probe_v.out: output of "pbs_probe -v"

    • pbs_hostn_v.out: output of "pbs_hostn -v $(hostname)"
    • pbs_environment: copy of PBS_HOME/pbs_environment file
  • core_file_bt/ (stack backtrace from core files)

    • sched_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/sched_priv

    • server_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/server_priv
    • mom_priv/: files containing the output of "thread apply all backtrace full" on all core files captured from PBS_HOME/mom_priv
  • system/
    • os_info: Information about the OS: version, flavour of linux etc. (output of "uname -a" and "cat /etc/*release*" for linux)
    • process_info: List of processes running on the system when the snapshot was taken (output of "ps -ef | grep pbs | grep -v grep" for linux)
    • lsof_pbs.out: output of "lsof | grep pbs | grep -v grep", only on linux systems
    • ps_aux_pbs.out: output of "ps -aux | grep pbs | grep -v grep", only on linux systems
    • etc_hosts: Copy of "/etc/hosts" file, only on linux systems.
    • etc_nsswitch_conf: Copy of "/etc/nsswitch.conf" file, only on linux systems.
    • vmstat.out: Output of the command 'vmstat', only on linux systems.
    • df_h.out: Output of the command 'df -h', only on linux systems.
    • dmesg.out: Output of the 'dmesg' command, only on linux systems.
  • ctime: this will log the time (since epoch) when the snapshot was taken.
  • pbs_snapshot.log: captures the logs generated by pbs_snapshot if the -l option is provided.

...

The interface for pbs_snapshot will be as follows:

sudo pbs_snapshot snapshot -o <dir> [OPTION]

-o <dir>: output directory

-d <pbs_diag>: diag directory to use as input

-H <hostname>: hostname to operate on. Defaults to the value of PBS_SERVER

-l <loglevel>: set log level to one of INFO, INFOCLI, INFOCLI2, DEBUG, DEBUG2,

                      WARNING, ERROR, FATAL

-

...

-service-logs=<num days> number of days of service logs to collect

--accounting-logs=<num days> number of days of accounting logs to collect

--additional_hosts=<hostname>: capture additional logs from the hosts specified

                                                        'hostname' is a comma separated list of hosts to take logs from

--map=<file>: path to filename to store the mapping of obfuscated data

--obfuscate: obfuscates euser, egroup, project, account_name, hostnames,

                     IP Addressses, PBS dataservice username

                     Deletes mail endpoints, owner, managers, operators, variable_list

                     ACLs, group_list, job name, jobdir

--version: print version number and exit


sudo - Currently pbs_snapshot will need to be run as a user with sudo privileges because it needs to access protected PBS information (e.g - information inside the PBS_HOME/ _priv directories)


Interface: Option -d <pbs_diag>Input -o <dir>

  • Synopsis:

...

  • Input to

...

  • specify path to

...

  • the output/snapshot directory
  • Details:
    • This

...

    • is

...

    • required input to

...

    • pbs_snapshot

...

    • .
    • If a directory <dir> already exists, pbs_snapshot will prompt the user to confirm whether to over-write it or not and over-write it if instructed.

Interface: Option -d <pbs_diag>

  • Synopsis: Option to provide the hostname to PBS serverpath to a pbs_diag directory to be used to generate the snapshot
  • Details:
    • This option will is meant to make pbs_snapshot ignore the value of PBS_SERVER and instead use the one provided.

Interface: Option -l <loglevel>

  • Synopsis: Option to set the desired log level for debugging pbs_snapshot
  • Details:
    • The <loglevel> can be set to INFO, INFOCLI, INFOCLI2, DEBUG, DEBUG2, WARNING, ERROR or FATAL.
    • The logging becomes more comprehensive going from FATAL to INFO.
    • By default, the log level will be set to INFOCLI2.
    • The generated logs will also be written out in the file 'pbs_snapshot.log' inside the snapshot directorybe usable on diags generated by the pbs_diag script.
    • <pbs_diag> should be path to a pbs_diag directory that's generated by unwrapping the tarball that pbs_diag produces.
    • This option will instruct pbs_snapshot to not query a live PBS system and instead use the information captured inside the diag to create the snapshot
    • No sudo privileges are needed when running pbs_snapshot using this option

Interface: Option -H <hostname>

  • Synopsis: Option to provide the hostname to PBS server
  • Details:
    • This option will make pbs_snapshot ignore the value of PBS_SERVER and instead use the one provided.

Interface: Option -o <dir>l <loglevel>

  • Synopsis: Option to specify path to the output/snapshot directoryset the desired log level for debugging pbs_snapshot
  • Details:
    • By default, pbs_snapshot will generate the snapshot directory inside /tmp/, and the snapshot directory will be named as "snapshot_<timestamp>"
      • <timestamp> will be in the format: YYYYMMDD_HH_MM_SS
    • If a directory <dir> already exists, pbs_snapshot will prompt the user to confirm whether to over-write it or notThe <loglevel> can be set to INFO, INFOCLI, INFOCLI2, DEBUG, DEBUG2, WARNING, ERROR or FATAL.
    • The logging becomes more comprehensive going from FATAL to INFO.
    • By default, the log level will be set to INFOCLI2.
    • The generated logs will also be written out in the file 'pbs_snapshot.log' inside the snapshot directory.

Interface: Option --service-logs=<num days>

...