Host processor systems may store and retrieve data using storage devices containing a plurality of host interface units (host adapters), disk drives, and disk interface units (disk adapters). Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass. and disclosed in U.S. Pat. No. 5,206,939 to Yanai et al., U.S. Pat. No. 5,778,394 to Galtzur et al., U.S. Pat. No. 5,845,147 to Vishlitzky et al., and U.S. Pat. No. 5,857,208 to Ofek, which are incorporated herein by reference. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels of the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical volumes. Different sections of the logical volumes may or may not correspond to the actual disk drives.
Information Lifecycle Management (ILM) concerns the management of data throughout the data's lifecycle. The value of data may change over time and, accordingly, the needs for the storage and accessibility of the data may change during the lifecycle of the data. For example, data that is initially accessed often may, over time, become less valuable and the need to access that data become more infrequent. It may not be efficient for such data infrequently accessed to be stored on a fast and expensive storage device. On the other hand, older data may suddenly become more valuable and, where once accessed infrequently, become more frequently accessed. In this case, it may not be efficient for such data to be stored on a slower storage system when data access frequency increases.
Logical devices containing the data that has been stored across multiple disk drives of a storage system may be accessed at different frequencies. Data dependency mining techniques are known for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling that are based on determining access correlations and patterns among blocks of stored data of the storage system, For example, data prefetching relates to obtaining data from a device prior to receiving an actual request for the data, such as a request from a host. Data prefetching techniques try to identify or recognize a pattern of I/O requests in order to try and predict what data will be requested next and prefetch data based on such prediction. For a detailed discussion of information access and management in a storage system using prefetch techniques, reference is made to U.S. Pat. No. 7,822,731 to Yu et al. entitled “Techniques for Management of Information Regarding a Sequential Stream,” which is incorporated herein by reference. Another known technique is the C-miner algorithm for dependency mining that provides an algorithm for mining block correlations in a storage system (see, e.g., Zhenmin Li et al., “C-Miner: Mining Block Correlations in Storage Systems,” In Proceedings of the 3rd USENIX Symposium on File and Storage Techniques (FAST), 2004, 14 pp., which is incorporated herein by reference).
Existing techniques for performing, data prefetching and/or other data layout or caching techniques, may include inefficiencies and/or may involve complex data mining algorithms. Accordingly, it would be desirable to provide an efficient and fast dependency mining technique for a storage system.