Skip to end of metadata
Go to start of metadata

Interface 1: New option to output (stdout/stderr) files go to the final destination, instead of being staged, if the final destination is known to be writable from the job execution node.

  • Visibility: Public
  • Change Control: Stable
  • Details: A user can have the option to have their job’s output (.o and .e) files to be written to the final destination, if the file system is available from mother superior, instead of being staged.
  • "d" modifier can be used with existing "qsub -k" option. (Ex. qsub -k oed)
  • The phrase "known to be writable" mean "the files ultimate destination host:path is mapped from the primary execution node via the existing $usecp directive in mom config".
  • The job's Output_path and Error_path are settable with the -o and -e options, and will be honored if the "d" modifier is used for the corresponding file.
  • The admin can make this behavior as default by using "default_qsub_arguments = -koed".
  • If the d modifier for -k is used but the specified file's final destination path(s) are NOT usecp-able, the mom should log a warning and continue running the job with normal spooling and staging to the final destination.
  • This will reflect in qstat -f output as: Keep_Files = oed

Interface 2: A user shall be able to provide an option at job submission time to have PBS remove the output files (.o and .e) for that job, if it completes successfully.

  • Visibility: Public
  • Change Control: Stable
  • Details: Introduce new "R" option for qsub which means "remove upon job completion".
  • "job completion" means terminated with no errors.
  • qsub -R oe job.sh
  • The admin can make this behavior as default by using "default_qsub_arguments = -Roe".
  • The user has the choice to tell which files has to be deleted. (.e or .o or both)
  • This will reflect in qstat -f output as: Remove_Files = oe

Interface 3: Warning messages will be generated in the following scenarios.

  • Visibility: Public
  • Change Control: Stable
  • Details:
  • The following warning message will be logged if direct write was requested but the path(s) are not usecp-able from the primary execution host.
  • "Direct write is requested for job:$job_id but the destination: $final_destination_directory is not usecp-able from $mom_hostname" (DEBUG)
  • Same message will be logged into job's stderr file as well.
  • The following warning message will be logged if job is rerun (qrerun) to copy the job's standard out/error files back to the Server until job is rescheduled and direct_write is enabled.

  • "Skipping copy of directly written $which file on rerun of job $jobid" (DEBUG3)

  • The following warning will come if the mom comes to a conclusion that the stdout/err files might have written directly and thereby it is not available in the spool area.
  • "Skipping directly written/absent spool file$file_path" (DEBUG3)

Interface 4: direct_write and remove_files options can be used with qalter.

  • Visibility: Public
  • Change Control: Stable
  • Details: A user can change the provided options for a particular job if the job has not started yet.
  • Usage of direct_write with qalter:
  • qalter -koed $jobid.
  • Usage of remove_files with qalter:
  • qalter -Roe $jobid.
  • If the job has already started running, it will throw the following (already existing) error :
  • qalter: Cannot modify attribute while job running  Remove_Files $jobid


Examples:

  • qsub -koed 

Means direct write both the job's output and error files to the Output_path and Error_path if host:path is usecp-able from the primary exec host.  If they are not, issue a warning in mom log and stderr file then do normal spooling and staging.

  • qsub -kod 

Means direct write the job's output file to the Output_path if host:path is usecp-able from the primary exec host.  If it is not, issue a warning in mom log and stderr file then do normal spooling and staging of the output file.  The job's error file will be spooled in $PBS_HOME/spool and staged to Error_path per existing behavior since nothing concerning it was specified.

  • qsub -ked 

Means direct write the job's error file to the Error_path if host:path is usecp-able from the primary exec host.  If it is not, issue a warning in mom log and stderr file then do normal spooling and staging of the error file.  The job's output file will be spooled and staged per existing behavior since nothing concerning it was specified.

  • qsub -Roe -koe

Means direct write both files to user's local home directory (does not matter if it is usecp-able in this case, this is existing -koe functionality), then remove both files upon successful job completion.

  • qsub -koed -Roe

Means direct write both the job's output and error files to the Output_path and Error_path if host:path is usecp-able from the primary exec host.  If they are not, issue a warning in mom log and stderr file then do normal spooling. When the job completes successfully, remove the output and error files either from their directly written location or from $PBS_HOME/spool.  If the job is unsuccessful, leave the files in place or stage them to Output_path and Error_path if they were spooled.

  • qsub -keo -Re

Means write both files to user's local home directory (does not matter if it is usecp-able in this case, this is existing -koe functionality), then remove only the error file upon successful job completion.

  • qsub -koed -Re

 Means both output and error files are directly written to the Output_path and Error_path if host:path is usecp-able from the primary exec host.  The error file will be removed upon successful job completion, the output file is retained

  • qsub -Wsandbox=PRIVATE -Roe

Means the output files will be written to sandbox and will get deleted upon successful completion.

  • When used with -j option: If the user specifies -joe then both the stdout and stderr get streamed to the .o file and -ke is specified, this error will be silently ignored.



Community discussion

  • No labels