Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations. In many parallel computing applications, a group of distributed processes often generate bursty data, such as checkpoint data that protects the distributed processes in the event of a failure. Checkpointing is a difficult workload for the storage system since each process simultaneously writes data to the storage system. Checkpoints thus create a bursty period of input/output (JO) in which the storage system is mostly idle except for infrequent periods of IO in which the bandwidth of the entire storage system is saturated and the expensive distributed processes in compute nodes are idle. Checkpoints often result in wasted resources since the storage system must be extremely powerful while remaining substantially idle between checkpoint phases.
It is desirable for storage systems to provide a minimum amount of capacity to store required data, such as checkpoint data, while also requiring a minimum amount of bandwidth to perform each storage operation quickly enough so that the expensive processors in the compute nodes are not idle for excessive periods of time. A need therefore exists for improved storage techniques in parallel computing environments.