The continued increase in data storage has been accompanied by an increasing need to create more than one accurate copy of particular data. Such copies are created by data mirroring, in which changes to a local copy of the data are mirrored on a remote copy.
A conventional data storage device, which is typically connected to a host processing system, contains an array of disk drives for data storage, a controller for controlling access to the disk array, and a cache memory. The cache memory is used for storing recently accessed data so as to provide quick access to data that is likely to be accessed in the near-term without having to access the disk on every occasion, thus reducing access latencies and throughput to applications running on the host processing system. When a data access request is received, the storage device first attempts to satisfy the request using the cache, before using the disk array. For example, when a READ operation is referencing data that is already in the cache, the data will be returned directly from the cache. For WRITE operations, the data is written into the data cache, replacing previous versions of the same data, if any, within the cache. Since a particular file or block of data may be located on the disk or in the cache, the storage device typically includes metadata (MD) that registers all data blocks currently in the cache and, therefore, indicates whether a data block is on the disk or stored in the cache. If the data block is in the cache, the MD indicates where the data block is stored in the cache. The MD also indicates the current state of the data block (i.e., whether or not it has been “flushed” to disk).
MD can take many forms and typically consists of complex data structures to describe the data stored in the cache. Therefore, any updates to the MD may involve a series of operations that should be performed atomically to maintain the integrity of the MD structure in the event of failure in the local or remote cache. That is, if one of the caches fails during a synchronous update of the MD, the integrity of the MD cannot be guaranteed. On the other hand, the data can be synchronously mirrored as it arrives from the hosts as it is essentially treated as a state-less stream of bytes (blocks) until its presence is properly registered in the MD.
FIG. 1 illustrates the multiple operations to update a link-list data structure that may be used to implement the storage of MD as known in the art. As shown in FIG. 1, link-list 100 has data 101–103 and data pointers 105 and 110. To add new data 104 to the link-list requires that pointer 105 be redirected to new data 104 and that pointer 106 be implemented to point from new data 104 to data 102. In the case of a double pointer link-list, four independent operations may be required for an update. Complex MD structures (e.g., MD trees) may require even more operations to effect an update. A failure that occurs during the course of completing these operations will lead to an inconsistent version of the data on the remote data storage system. That is, once the series of update operations has begun the data is not consistent until the update operation is complete. Due to the number of operations involved in updating MD, there is an increased time during which a data storage system failure could result in inconsistent data. Moreover, any concurrent attempt to update the same data structure may cause the operations of each update attempt to become interleaved, thereby exacerbating the problem of inconsistent data.