Replication is a term of art where the data maintained in one geographic location is transparently copied between one storage system and another storage system usually located in another geographic location. In general, “transparency” refers to the fact that the computer systems relying upon the replicated data have no knowledge or awareness that the replication of data is taking place. The distance between these storage systems can span from micrometers to kilometers. Typically replication is carried out for disaster recovery purposes. This relieves the software and hardware of the computer systems and other devices relying upon the replicated data from the burden of undertaking the steps involved with replicating the data.
In a typical replicated system, data are initially copied without regard to temporal sequencing. (Temporal sequencing refers to time sequencing. The initial copy is typically made as a whole without reference to time; therefore, the initial copy is not time sequenced.) Once the initial copying of data has completed, the storage system typically switches modes and begins to temporally sequence any subsequent changes made to the storage system. Such transactions provided in temporal sequence are typically said to be made in “journal mode.”
Many systems relying upon storage systems have volume management operations that can tolerate failure so long as the data on any specific storage system are temporally sequenced. Thus, journaling is an important element of modern replicated storage systems.
It is important to remember that data cannot be physically added to or removed from a storage system. Instead, the bit patterns residing on the storage device are changed so that its bit pattern matches the pattern desired by the computer systems and other devices relying upon the data. For example, a newly initialized storage system may contain all zeros representing a blank area on the storage system. The computer or other device relying on this data may change these zeros to a different pattern representing the data to be stored for persistent use. In turn, the source storage system will take the steps to copy this pattern to the replicated target.
The concept of data storage virtualization increases the complexity of the operations described above. The storage system typically presents its available storage in a conventional form familiar to the computer or device. However, the storage system is under no obligation to interact with the data in the same manner as a conventional device. For example, a typical device may be presented as a LUN (Logical Unit Number) to the computer. In conventional terms, a LUN may consist of an entire disk mechanism (i.e., disk drive.) However, the storage system is free to spread this data across countless disks or other storage devices or media. The computer relying on this data would “understand” that this virtual LUN consists of only a single disk mechanism; while, in fact, the single LUN is actually a collection of multiple disk devices. This is only one of many examples of virtualization.
Replicated storage systems can be manufactured by a single vendor. Additionally, multiple vendors can be used in designing a replicated storage system. The concept of virtualization permits the use of different storage systems manufactured by one or more vendors. The primary constraint in using one or more dissimilar storage systems for replication requires that the total amount of replicated storage space is limited to the smaller of the source or target storage systems.
The software, hardware, and firmware of the replicated storage system generally functions to ensure that the data contained in the source system matches the target system. The storage systems typically use their own storage to create the structures to support the underlying data used or accessed by computers or devices. Storage system uses “structural data” to provide a map to the where data is stored in its system. This structural data may include, for example, information on logical volume managers or files systems. This structural data is referred to herein as “super-meta-data.” Super-meta-data used by a given storage system is typically not visible (i.e., accessible) to the computers and data accessing the storage system as it is particular to the local storage system. Super-meta-data would generally not be replicated between storage systems.
To be clear, one would find no difference when examining a virtual LUN presented by a storage system as an array of linear bytes, and comparing this to a replicated copy of the image that was maintained on a single conventional LUN (disk drive.) The super-meta-data maintained on the storage system would be used to produce the data in proper sequence from the various underlying storage devices used by the storage system. (Although the conventional LUN (disk drive) used in our example does, in fact, rely upon super-meta-data as a means to spare bad blocks, for example, the storage system super-meta-data may involve multiple disk caches and disk mechanisms where the single conventional LUN would involve only a single disk mechanism.)
Most modern storage systems have ways to ensure that the data on the source matches the target. However, replicated storage systems with identical hardware and firmware configurations will not share the same super-meta-data. As a result, replicated storage systems are not exact copies of one another. Thus, a direct comparison is not meaningful for validating the integrity of the replicated data. In addition, the performance of the storage system would suffer if one were to attempt to compare the entire replicated LUN on a byte-by-byte basis.
To avoid these difficulties, many replicated storage systems employ an “end to end” checksum as a means to ensure data integrity. However, this test only compares a small portion of the data and does not examine the data in its entirety. End-to-end checksumming may ensure that the data traveling across the network remains intact; however, it does not ensure that the data will land in the proper place within the storage system. Such algorithms tend to be very specific to a class of storage systems. In addition, the end-user of the data has to rely on the accuracy of the algorithms and the end-user has to forgo any verification that the data has landed in the proper place within the storage system.
Therefore, applicant has identified a need for a data validation approach between replicated systems that verifies the integrity of the entire data volume and not just a transmission. The present invention fulfills this need among others.