Parallel storage systems are widely used in many computing environments. Parallel storage systems provide high degrees of concurrency in which many distributed processes within a parallel application simultaneously access a shared file namespace.
Parallel computing techniques are used in many industries and applications for implementing computationally intensive models or simulations. For example, the Department of Energy uses a large number of distributed compute nodes tightly coupled into a supercomputer to model physics experiments. In the oil and gas industry, parallel computing techniques are often used for computing geological models that help predict the location of natural resources. One particular parallel computing application models the flow of electrons within a cube of virtual space by dividing the cube into smaller sub-cubes and then assigning each sub-cube to a corresponding process executing on a compute node.
Storage tiering techniques are increasingly used in parallel computing environments to more efficiently store the vast amounts of information. For example, the Symmetrix® system from EMC Corporation is an enterprise storage array that optionally includes Fully Automated Storage Tiering (FAST). Storage tiering techniques typically combine Non-Volatile Random Access Memory (NVRAM), also referred to as flash memory, with more traditional hard disk drives (HDDs). Flash memory is used to satisfy the bandwidth requirements of a given system while the hard disk drives are used to satisfy the capacity requirements.
Storage tiering systems typically use automated tiering techniques that control the tiering of data (i.e., on what tier(s) the data is stored). The automated techniques are typically based on heuristic methods that store the data blocks on different tiers of storage based on an understanding of the characteristics of the data. While existing tiering techniques demonstrate acceptable performance, there are a number of cases where the application exhibits different behavior than that observed by the auto-tiering software. Thus, a significant percentage of applications observe poor performance due to an inappropriate data placement policy.
A need therefore exists for improved techniques for determining how to store data in a hierarchical storage tiering system. Moreover, there is a need for data protection and snapshots that are based on a programmatic approach controlled, for example, by an information technology (IT) administrator.