Current techniques for capturing representations of scientific data as it is stored to stable storage requires scientific applications to use specific application programming interfaces (APIs) such as SQL, NetCDF, HDF5, and SciDB. These APIs each store their data in their own format. Additionally, in a large scale parallel setting, each of these techniques often requires a large percentage of the data to be shuffled around before storing the data to make it efficient for the underlying storage.
The Parallel Log-structured File System (PLFS) was created to break the need for scientific parallel applications to have to align their data for enabling efficient use of the underlying storage system; PLFS breaks the need for these science applications to be concerned with alignment/locking issues. However, this insulation from the particulars of the underlying storage system comes at the cost of storing substantial mapping metadata along with the data.
Therefore, there exists ample opportunity for improvement in technologies related to data storage in a collective parallel processing environment.