Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Motivation:

  • pbs_snapshot --obfuscate has many issues: it doesn't obfuscate everything, there are bugs with obfuscating special attributes (like it only obfuscates the first entry in managers, acl attributes etc.), the obfuscated string has the same length as the original string, which could be decrypted back to the original, etc.
  • pbs_snapshot uses a PTL utility called pbs_anonutils.py which contains code that is somewhat complicated to develop. It's designed to have separate routines to obfuscate specific kinds of outputs (tabular, long format, resourcedef, accounting logs, etc.), and was not written for obfuscating snapshots as a whole.
  • To use the existing architecture of pbs_anonutils to obfuscate new outputs like json would have required either writing specialized routines for those outputs, or re-writing existing routines to be more generic, which would have meant essentially re-writing pbs_anonutils.


Proposal:

Add a snapshot obfuscation utility inside pbs_snaputils.py itself which obfuscates an entire snapshot in one go.

Architecture:

  • Introducing a new class in pbs_snaputils called 'ObfuscateSnapshot' which will contain the following:
    • Information about attributes that need to be deleted or obfuscated
    • A routine called obfuscate_snapshot(<path to snapshot>, <path to map file>) which can obfuscate a snapshot completely
      • The routine currently deletes any sched, server, comm and database logs captured as we cannot obfuscate them. A future enhancement will be done to add support for obfuscating them. The idea is to not capture anything that we cannot obfuscate.
      • Algorithm:
        • The routine first obfuscates the long stat format outputs like qstat -f, pbs_rstat -f, pbsnodes -av, etc.
          • While doing this, it also creates an obfuscation map of (sensitive value: obfuscated value)
          • It also deletes necessary attributes and stores their value in a separate list.
        • It then parses custom resources from resourcedef file, generates obfuscated values for them and adds them to the obfuscation map
        • Then calls obfuscate_acct_logs() to obfuscate accounting logs, this can add more entries to the obfuscation map.
        • Then deletes all daemon logs
        • Finally, it goes through ALL files in the snapshot and does the following:
          • sed -i 's/\b<sensitive value>\b/<obfuscated value>/g' to replace any sensitive values in the file using the obfuscation map created above.
          • Goes through the list of attribute values to delete and deletes them from the file
    • A routine called obfuscate_acct_logs(<path to snapshot>) which can obfuscate all accounting logs in the path mentioned.


  • No labels