Host processor systems may store and retrieve data using storage devices, or storage arrays, containing a plurality of host interface units (host adapters), disk drives, and disk interface units (disk adapters). Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass., such as in connection with one or more of EMC's Symmetrix products. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels of the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical volumes. Different sections of the logical volumes may or may not correspond to the actual disk drives.
Information Lifecycle Management (ILM) concerns the management of data throughout the data's lifecycle. The value of data may change over time and, accordingly, the needs for the storage and accessibility of the data may change during the lifecycle of the data. For example, data that is initially accessed often may, over time, become less valuable and the need to access that data become more infrequent. It may not be efficient for such data infrequently accessed to be stored on a fast and expensive storage device. On the other hand, older data may suddenly become more valuable and, where once accessed infrequently, become more frequently accessed. In this case, it may not be efficient for such data to be stored on a slower storage device when data access frequency increases.
In some instances, it may be desirable to copy data from one storage device to another. For example, if a host writes data to a first storage device, it may be desirable to copy that data to a second storage device provided in a different location so that if a disaster occurs that renders the first storage device inoperable, the host (or another host) may resume operation using the data of the second storage device. Such a capability is provided, for example, by a Remote Data Facility (RDF) product provided by EMC Corporation of Hopkinton, Mass., e.g., Symmetrix Remote Data Facility (SRDF). With RDF, a first storage device, denoted the “primary storage device” (or “R1”) is coupled to the host. One or more other storage devices, called “secondary storage devices” (or “R2”) receive copies of the data that is written to the primary storage device by the host. The host interacts directly with the primary storage device, but any data changes made to the primary storage device are automatically provided to the one or more secondary storage devices using RDF. The primary and secondary storage devices may be connected by a data link, such as an ESCON link, a Fibre Channel link, and/or a Gigabit Ethernet link. The RDF functionality may be facilitated with an RDF adapter (RA) provided at each of the storage devices.
Data transfer among storage devices, including transfers for data replication or mirroring functions, may involve various data synchronization operation modes and techniques to provide reliable protection copies of data among a source or local site and a destination or remote site. In synchronous transfers, data may be transmitted to a remote site and an acknowledgement of a successful write is transmitted synchronously with the completion thereof. In asynchronous transfers, a data transfer process may be initiated and a data write may be acknowledged before the data is actually transferred to directors (i.e. controllers and/or access nodes) at the remote site. Asynchronous transfers may occur in connection with sites located geographically distant from each other. Asynchronous distances may be distances in which asynchronous transfers are used because synchronous transfers would take more time than is preferable or desired.
For both synchronous and asynchronous transfers, it may be desirable to maintain a proper ordering of writes such that any errors or failures that occur during data transfer may be properly identified and addressed such that, for example, incomplete data writes be reversed or rolled back to a consistent data state as necessary. Reference is made, for example, to U.S. Pat. No. 7,475,207 to Bromling et al. entitled “Maintaining Write Order Fidelity on a Multi-Writer System,” which is incorporated herein by reference, that discusses features for maintaining write order fidelity (WOF) in an active/active system in which a plurality of directors (i.e. controllers and/or access nodes) at geographically separate sites can concurrently read and/or write data in a distributed data system.
For further discussions of data ordering and other techniques used for synchronous and asynchronous data replication processing in various types of systems, including types of RDF systems and products produced by EMC Corporation of Hopkinton, Mass., reference is made to, for example, U.S. Pat. No. 8,335,899 to Meiri et al., entitled “Active/Active Remote Synchronous Mirroring,” U.S. Pat. No. 8,185,708 to LeCrone et al., entitled “Host Implementation of Triangular Asynchronous Replication,” U.S. Pat. No. 7,779,291 to Yoder et al., entitled “Four Site Triangular Asynchronous Replication,” U.S. Pat. No. 7,613,890 to Meiri, entitled “Consistent Replication Across Multiple Storage Devices,” and U.S. Pat. No. 7,054,883 to Meiri et al., entitled “Virtual Ordered Writes for Multiple Storage Devices,” which are all incorporated herein by reference.
In connection with data replication using RDF systems, one issue that may occur is discrepancies in data storage management between R1 and R2 devices when ILM techniques are used. For example, date that is accessed frequently on an R1 device may be stored and managed at a location on the R1 device that is suitable for the need for frequent access of that data. However, when replicated to the R2 device, that same data, existing as a data backup copy, may not be accessed as frequently. Accordingly, the data on the R2 device, although being a copy of the R1 data, may be stored and managed differently on the R2 device than on the R1 device. In situations of failover to the R2 device, or other uses for the R2 device, the R2 device may not immediately be able to support the workload as the new primary device because the data copy stored thereon may not be stored as efficiently or effectively as on the R1 device. Transferring all information between the R1 and R2 devices during normal operation to maintain the same ILM storage management on each of the devices may not be a practical solution due to the amount of information transfer that this would require, among other reasons.
It is also noted that, in a storage device, front end (FE) accesses may be distinguished from back end (BE) accesses of the storage device. A front end access is an access operation as seen by a requesting host/application requesting access to data of the storage device, whereas a back end access is the actual access of data on the actual disk drive storing the data. Storage tiering operations for purposes of ILM management, in which data is stored among different storage tiers of a storage device (e.g., a Serial ATA (SATA) tier, a Fibre Channel (FC) and/or an Enterprise Flash Drive (EFD) tier) based on access levels, may be based principally on the actual disk drive accesses at the storage device back end rather than accesses at the front end as seen by the requesting host/application. Data initially accessed at the back end, i.e. from the disk drives, may then be stored in a cache, that has a fast access speed, in connection with servicing a host's request, such as a read request, at the front end of the storage device. The cache may not be emptied immediately such that recently-accessed data may stay in the cache for future front end access (e.g. read) operations by the host without causing subsequent access operations at the back end of the storage device with the actual disk drives.
Use of the cache in this manner affects determinations of number of input/output (I/O) operations, since, for example, data that is accessed frequently from the cache for front end read requests might appear as if it is not accessed frequently as seen by the back end of the system, e.g., the data was accessed once at the beginning of the day from the disk drives and thereafter accessed by the host from the cache. Further, it is noted that monitoring complete access statistics of front end access operations may, in many cases, be impractical, since many such front end access operations may occur, and metric or statistic collection processes may thereby negatively impact access speed of the front end access operations.
Accordingly, it would be desirable to provide a system that allows for the efficient management of data in a storage device among multiple storage devices, particularly involving considerations of how data is accessed in connection with the storage of data on the multiple storage devices.