1. Field of the Invention
The present invention relates generally to storage systems, and, more particularly, to data protection and recovery in storage systems.
2. Description of Related Art
In storage systems, there is always a threat that hackers, disgruntled employees, or other unforeseen circumstances might destroy data. In order to prevent such a threat, some storage systems keep a log of every important operation using, for example, a mechanism called syslog or IDS (intrusion detection system) for the purpose of identifying malicious attacks. The syslog protocol has been used for the transmission of event notification messages across networks for many years. Under the syslog protocol, event messages and alerts may be transmitted across a network by a sending device to a collecting device, such as at the start or end of a process, or to report the current status of a process. A syslog server is a daemon that is set up on a computer to receive syslog messages from hosts and other syslog-enabled devices. (See, e.g., Lonvick, C., “RFC 3164—The BSD Syslog Protocol”, The Internet Society, August 2001.) However, while the use of logging with a syslog server can enable a user to identify attacks, the downside of these methods is that an attack is not found until after the attack is made.
There is also a technology known as “continuous data protection” (CDP), in which a storage system continuously captures or tracks every data modification. Under CDP technology, the data is backed up whenever any change is made to the data. Continuous data protection is different than traditional backup in that it is not necessary for a user to specify a point in time at which the user would like to recover data until the user is actually ready to perform a restore operation. Traditional data backup systems are only able to restore data to certain discrete points in time at which backups were made, such as one hour, one day, one week, etc. However, with continuous data protection, there are no backup schedules. If the storage system becomes contaminated with a virus, or if a file in the system is corrupted or accidentally deleted, and the problem is not discovered until some time later, a user is still able to recover the most recent uncorrupted version of the file. Further, a CDP system set up on a disk array storage system enables data recovery in a matter of seconds, which is considerably less time than is possible with tape backups or archives.
Thus, the basic purpose of CDP is to enable recovery of data at any desired or essential point in time when it becomes necessary for data to be recovered. In effect, CDP creates a continuous journal of complete storage snapshots, i.e., one storage snapshot for every instant in time that a data modification occurs. In the CDP method, storage systems, backup software in host computers, or other hardware or software captures write I/O operations from host computer file systems and records all of the write I/Os as a database journal. Also when CDP is started, the system initially preserves a snapshot copy of the production data volumes (i.e., the volumes for which the users want to have the data backed up), which is the initial image of the volumes when CDP is started. When recovering data, by applying the journal against the initial image of the volumes, CDP enables recovery of data at any point when write I/Os were made to the primary volumes.
U.S. Patent Application Publication Nos. 2005/0015416 to Kenji Yamagami, 2005/0028022 to Takashi Amano, and 2005/022213 to Kenji Yamagami, the disclosures of which are incorporated herein by reference in their entireties, disclose CDP journaling techniques for fast data recovery. These references include a discussion of two journal entry types, an “AFTER” journal entry and a “BEFORE” journal entry, that are maintained for enabling recovery of data in a production data volume, should recovery be necessary. When a write request from a host computer arrives at a storage system, a journal entry is generated in response. The journal entry comprises a journal header and journal data. The journal header contains information about its corresponding journal data. The journal data comprises the data (write data) that is the subject of the write operation. This kind of journal is referred to as an “AFTER journal”, since it is what the data will look like “after” the write operation is implemented. A BEFORE journal contains the original data of the area in storage that is the target of a write operation. A BEFORE journal entry therefore represents the contents “before” the write operation is performed. A BEFORE journal is created by copying original data of the area of storage that is the target of a write operation before the write operation is performed. This enables data recovery by applying data back from the BEFORE journal to a production data volume.
However, when data in a storage system is destroyed or corrupted, and the user desires to roll back the data to before the attack or other incident, it is sometimes difficult for the user to identify the particular time to which to roll the data back. For example, a log analysis module is typically unable to determine the exact timing when an event necessitating data roll back may have occurred. In such a case, the system may end up deleting too many I/O operations which otherwise would not need to be deleted. Thus, under conventional means, a human operator has to go through and check each I/O operation precisely to identify whether the I/O operation was received before the incident or after the incident, which is very laborious task.