Parallel storage systems are widely used in many computing environments. Parallel storage systems provide high degrees of concurrency in which many distributed processes within a parallel application simultaneously access a shared file namespace.
Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations. For example, the Department of Energy uses a large number of distributed compute nodes tightly coupled into a supercomputer to model physics experiments. In the oil and gas industry, parallel computing techniques are often used for computing geological models that help predict the location of natural resources. Generally, each parallel process generates a portion, referred to as a data chunk, of a shared data object.
Compression is a common technique to store data with fewer bits than the original representation. For example, lossless compression reduces bits by identifying and eliminating statistical redundancy. Among other benefits, compression reduces resource usage, such as data storage space or transmission capacity.
Existing approaches compress the shared data object after it has been sent to the storage system. The compression is applied to offset ranges on the shared data object in sizes that are pre-defined by the file system.
In parallel computing systems, such as High Performance Computing (HPC) applications, the inherently complex and large datasets increase the resources required for data storage and transmission. A need therefore exists for parallel techniques for compressing data chunks being written to a shared object.