The invention generally relates to data recover systems, and relates in particular to systems that provide data recovery in computer systems that fail in such a way that backup systems and snapshot systems may not provide sufficient protection of data.
With explosive growth of networked information services and e-commerce, data protection and recovery have become the top priority of business organizations and government institutions. Since data is typically an extremely valuable asset of an organization, any loss or unavailability of data can cause millions of dollars of damage. Unfortunately, failures do occur such as hardware failures, human errors, software defects, virus attacks, power failures, site failures, etc. In order to protect data from possible failures and to be able to recover data in case of such a failure, data protection technology is necessary.
Traditionally, data protection has been done using periodic backups. At the end of a business day or the end of a week, data is backed up to tapes. Depending on the importance of data, the frequency of backups varies. The higher the backup frequency, the larger the backup storage is required. In order to reduce the backup volume size, technologies such as incremental backups and copy-on-write (COW) snapshots have been commonly used. Instead of making full backups every time, incremental backups and COW snapshots store only the changed data, and this is done more frequently, between full backups. For example, one may perform daily incremental backups and weekly full backups that are stored at both the production site (that includes a server host and production storage) and a backup site (that includes a backup server, and a backup storage). The production site and the backup site are connected to one another by a communication system such as a network. In this way, great storage savings are possible while keeping data protected.
Incremental backup works as follows. Starting from the previous backup point, the storage keeps track of all changed blocks. At the backup time point, a backup volume is formed consisting of all of the latest changed data blocks. As a result, the incremental backup contains the newest data that have changed since the last backup. COW snapshots work differently from the incremental backup. At the time when a snapshot is created, a small volume is allocated as a snapshot volume with respect to the source volume. Upon the first write to a data block after the snapshot was started, the original data of the block is copied from the source volume to the snapshot volume. After copying, the write operation is performed on the block in the source volume. As a result, the data image at the time of the snapshot is preserved. Write I/Os after the first change to a block is performed as usual, i.e., only the first write to a block copies the original data to the snapshot volume. There have been many variations of COW snapshots in terms of implementation details for performance and efficiency purposes such as pointer remapping and redirect-on-writes etc. The main advantage of both incremental backups and COW snapshots is storage savings because only changed data is backed up.
Despite the rapid advances in computer technology over the past three decades, data backup is fundamentally performed the same as it was 30 years ago. It is well known that backup remains a costly and highly intrusive batch operation that is prone to error and consumes an exorbitant amount of time and resources. There has been research reported in the literature recently on improving data availability and recoverability such as continuous data protection (CDP), synchronous/asynchronous data replications, and data de-duplications. While these technologies aimed at increasing the backup frequency and reducing storage sizes for backup volumes, the fundamental techniques used are still based on incremental backups or COW snapshots. It appears to be generally accepted within the information technology (IT) community that these techniques will usually work and that data can usually be recovered.
The reality, however, is that in a substantial number of cases, backup data is not sufficiently recovered, and even if data is recovered, it takes hours and even days to do so.
There remains a need to provide a data recovery system and method that may function in further conditions of failure and that provides improved protection of data in storage subsystems.