Parallel storage systems are widely used in many computing environments. Parallel storage systems provide high degrees of concurrency in which many distributed processes within a parallel application simultaneously access a shared file namespace. Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations. For example, the Department of Energy uses a large number of distributed compute nodes tightly coupled into a supercomputer to model physics experiments. In the oil and gas industry, parallel computing techniques are often used for computing geological models that help predict the location of natural resources. One particular parallel computing application models the flow of electrons within a cube of virtual space by dividing the cube into smaller sub-cubes and then assigning each sub-cube to a corresponding process executing on a compute node.
In many parallel computing applications, a group of distributed processes must often globally append data to a shared file. When multiple processes attempt to append data to a shared file concurrently, however, the performance of the parallel storage system will be impaired. Serialization can cause significant performance degradation as the parallel processes must remain idle while they wait for one another. Serialization is incurred when the parallel file system locks a shared file in order to maintain the consistency of the shared file.
A number of techniques have been proposed or suggested to organize the data streams when multiple processes simultaneously save data to a shared file. For example, each process can create a single file across a set of different directories and then sequentially write a large amount of data to the single file. In a further implementation, a single process (often referred to as a “leader”) can create a shared file, and then all the processes write to the shared file in segments that are aligned with block boundaries within the parallel file system.
A need therefore exists for improved techniques for globally appending data from a group of distributed processes to a shared file.