Parallel storage systems are widely used in many computing environments. Parallel storage systems provide high degrees of concurrency in which many distributed processes within a parallel application simultaneously access a shared file namespace.
Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations. For example, the Department of Energy uses a large number of distributed compute nodes tightly coupled into a supercomputer to model physics experiments. In the oil and gas industry, parallel computing techniques are often used for computing geological models that help predict the location of natural resources. Generally, each parallel process generates a portion, referred to as a data chunk, of a shared data object.
Data migration is a common technique to transfer data between storage types, formats, and/or computer systems. Data migration is usually performed programmatically to achieve an automated migration. Data migration occurs for a variety of reasons, such as equipment replacement or to achieve cost effective long term storage of data. It is often desired, however to migrate the data to a system that cannot accommodate the migration due to, for example, performance and/or capacity constraints of the desired archival storage system. In parallel computing systems, for example, such as High Performance Computing (HPC) applications, the inherently complex and large datasets increase the resources required for data storage and transmission. A need therefore exists for improved techniques for migrating data to an archival shared system.