Data storage and integrity are important components of information age business operations. Enterprises are increasingly moving towards data protection and disaster recovery strategies to prepare for, and recover from, data loss disasters. While some risks to stored data are physical and tangible (for example, failure of a disk drive, fire, or floods), other dangers are intangible or logical (for example, accidental deletion of files, or an attack by a computer virus). Data must be protected from the first category of dangers through physical means, such as remote replication, Redundant Arrays of Inexpensive Disks (RAID), highly-available systems, tape backups, and such.
The second category of inadvertent erasure or modification of data is traditionally mitigated through various approaches. For example, solutions may employ file versioning, tape backups, or periodic backup to a remote server. Many of these solutions are periodic, meaning that they may be executed once a day or even less frequently. As such, when data needs to be recovered, there is a data loss that could be as high as the data created during the time period between two backups.
Requirements to protect against loss of data, along with various regulatory compliance requirements, are driving the move towards solutions involving Continuous Data Protection (CDP). According to the Storage Networking Industry Association's (SNIA) CDP Special Interest Group, CDP is a “methodology that continuously captures or tracks data modifications and stores changes independently of the primary data, enabling recovery points from any point in the past. CDP systems may be block, file or application based and can provide fine granularities of restorable objects to infinitely variable recovery points.” Such a definition implies three primary aspects to a CDP implementation. First is the ability to track and capture data. Second is the ability to rollback to any point in the history of the volume. Third is the ability to store captured data in a location external to the main data.
Generally, CDP implementations make use of special dedicated external devices that track and capture history information for every input/output (I/O) operation on a storage node. Data is generally replicated over-the-wire as network traffic to the dedicated CDP devices. These separate devices and the related additional network traffic increases the total cost of ownership (TCO) for the data protection system. Additionally, there is an impact on storage system performance to support CDP.
In typical CDP systems, data within a storage block must be backed up before an I/O write can be performed to that block. As such, each I/O operation can generate three operations within the storage system. That is, one to read the original data, a second to store a backup of the original data, and then a third to write the new data. Traditionally, CDP systems use a dedicated data tap on each host. The data tap duplicates any I/O from the host to provide one copy to the storage volume and another copy to the CDP device. Again, extra system components, and extra sequentially performed storage steps add to the system overhead for implementing CDP.
It is with respect to these considerations and others that the disclosure made herein is presented.