Parallel storage systems are widely used in many computing environments. Parallel storage systems provide high degrees of concurrency in which many distributed processes within a parallel application simultaneously access a shared file namespace. Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations.
In many parallel computing applications, a group of distributed processes must often write data to a shared file. When multiple processes attempt to write data to a shared file concurrently, however, the performance of the parallel storage system will be impaired. Serialization can cause significant performance degradation as the parallel processes must remain idle while they wait for one another. Serialization is incurred when the parallel file system locks a shared file in order to maintain the consistency of the shared file.
A number of techniques have been proposed or suggested to organize the data streams when multiple processes simultaneously save data to a shared file. For example, each process can create a single file across a set of different directories and then sequentially write a large amount of data to the single file. In a further implementation, a single process (often referred to as a “leader”) can create a shared file, and then all the processes write to the shared file in segments that are aligned with block boundaries within the parallel file system.
Parallel Log Structured File System (PLFS) is a virtual log-structured file system that allows data to be written quickly into parallel file systems. PLFS is particularly useful when multiple applications write concurrently to a shared file in a parallel file system. Generally, PLFS improves write performance in this context by rearranging the IO (Input/Output) operations from being write operations to a single file to being write operations to a set of sub-files. Metadata is created for each sub-file to indicate where the data is stored. The metadata is resolved when the shared file is read. One challenge, however, is that the amount of metadata required to be read data back can be extremely large. Each reading process must read all of the metadata that was created by all of the writing processes. Thus, all of the reading processes are required to redundantly store the same large amount of metadata in a memory cache.
A need therefore exists for improved techniques for storing metadata associated with sub-files from a single shared file in a parallel file system.