Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Introducing a new class in pbs_snaputils called 'ObfuscateSnapshot' which will contain the following:
    • Information about attributes that need to be deleted or obfuscated
    • A routine called obfuscate_snapshot(<path to snapshot>, <path to map file>) which can obfuscate a snapshot completely
      • The routine currently deletes any sched, server, comm and database logs captured as we cannot obfuscate them. A future enhancement will be done to add support for obfuscating them. The idea is to not capture anything that we cannot obfuscate.
      • Algorithm:
        • The routine first obfuscates the long stat format outputs like qstat -f, pbs_rstat -f, pbsnodes -av, etc.
          • While doing this, it also creates an obfuscation map of (sensitive value: obfuscated value)
          • It also deletes necessary attributes and stores their value in a separate list.
        • It then parses custom resources from resourcedef file, generates obfuscated values for them and adds them to the obfuscation map
        • Then calls obfuscate_acct_logs() to obfuscate accounting logs, this can add more entries to the obfuscation map.
        • Then deletes all daemon logs
        • Binary job files:
          • We capture the printjob output of all .JB files and save them as <jobid>.JB_printjob. These text files then get obfuscated
          • We delete all other files inside the jobs directory
        • Finally, it goes through ALL files in the snapshot and does the following:
          • sed -i 's/\b<sensitive value>\b/<obfuscated value>/g' re.sub(r'\b' + key + r'\b', val, <file content>) to replace any sensitive values in the file using the obfuscation map created above.
          • Goes through the list of attribute values to delete and deletes them from the file
    • A routine called obfuscate_acct_logs(<path to snapshot>) which can obfuscate all accounting logs in the path mentioned.

...