Support Cray DataWarp - Job Instances

Description

Cray DataWarp

Cray DataWarp provides an intermediate layer of high bandwidth, file-based storage to applications running on compute nodes. It is comprised of commercial SSD hardware and software, Linux community software, and Cray system hardware and software. DataWarp storage is located on server nodes connected to the Cray system's high speed network (HSN). I/O operations to this storage completes faster than I/O to the attached parallel file system (PFS), allowing the application to resume computation more quickly and resulting in improved application performance. DataWarp storage is transparently available to applications via standard POSIX I/O operations and can be configured in multiple ways for different purposes. DataWarp capacity and bandwidth are dynamically allocated to jobs on request and can be scaled up by adding DataWarp server nodes to the system. [Source: XC™ Series DataWarp™ User Guide (CLE 6.0.UP01)]

Cray DataWarp Integration with PBS Professional

The expectation of the integration between Cray DataWarp and PBS Professional is to

1. Schedule jobs based on the availability of DataWarp storage capacity.
2. Setup the DataWarp job instance before the job begins execution, such that the following DataWarp functions are executed
a. paths
b. setup
c. data_in
d. pre_run
3. Teardown the DataWarp job instance after the job terminates (i.e., normal, error, abort), such that the following DataWarp functions are executed
a. post_run
b. data_out
c. teardown
4. When a non-successful DataWarp exit code (1) is detected, Altair will attempt to re-queue the job if the job has not started execution, or leave the data and job instance allocation intact is the job had executed allowing the user/admin to manually resolve any issues.

Acceptance Criteria

None

Status

Assignee

Scott Suchyta

Reporter

Scott Suchyta

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Priority

Medium
Configure