In related U.S. patent application Ser. No. 10/924,652 and U.S. patent application Ser. No. 10/668,833, a time-dependent data storage and recovery technique is disclosed. Embodiments of such a technique provide a solution for continuous data protection (CDP) wherein write commands directed to a storage system are intercepted by a storage management system having a current store and a time store. The current store may maintain or have access to a current (or mirror) copy of the storage system's digital content. The time store may record information associated with each intercepted write command, such as new data in the write command's payload or old data to be overwritten in the current store in response to the write command. Recordation of the new or old data in response to a write command may be referred to as a copy-on-write (COW) operation, and the new or old data recorded may be referred to as COW data. The time store may also record other information (i.e., metadata) associated with an intercepted write command and/or the corresponding COW operation, such as, for example, a timestamp, an original location in the current store where the old data are overwritten, and a destination location in the time store to which the COW data are copied. Each COW operation typically backs up one or more blocks of COW data, thereby creating one set of COW data and corresponding metadata. Over a period of time, multiple sets of COW data and corresponding metadata (including timestamps) may be accumulated as a collection of historical records of what have been written or overwritten in the current store or the storage system. The content of the time store may be indexed based on the metadata to facilitate efficient access to the COW data.
With a current copy of the storage system's digital content in the current store and the historical records in the time store, the storage management system adds a new dimension, i.e., time, to the storage system. Assuming the storage management system has been operatively coupled to the storage system since a past time, the storage management system may quickly and accurately restore any addressable content in the storage system to any point in time between the past time and a present time.
Ideally, it might be desirable to maintain such a data recovery capability for as long a timeline as possible. However, to accommodate an extended timeline, a significant amount of storage space is needed to store the COW data and corresponding metadata for every write command in that timeline. Even more storage space is needed if the storage system sees a relatively high write rate (i.e., number of write operations per unit time). One temporary solution may be to simply increase storage capacity of the time store. However, apart from a higher cost, a simple storage increase may not scale well with the rest of the system and tends to create a deluge of other problems, such as a performance degradation due to difficulties of parsing through an additional amount of data. Without an infinite storage capacity, most storage systems have to settle for the reality that only a finite length of timeline (e.g., ten days or two weeks) can be maintained. In conventional data protection systems, it is typical to keep a few days' worth of backup data and completely discard the backup data that are more than a few days old. In these systems, data recovery capabilities are limited to the past few days for which backup data are available. Alternatively, the backup data that are more than a few days old may be moved off site on a regular basis. Such a brute-force solution can be costly and disruptive, not to mention its slow response to data recovery requests where off-site data are needed.
In view of the foregoing, it would be desirable to provide a solution for data storage management which overcomes the above-described inadequacies and shortcomings.