The invention generally relates to the field of redundant data storage systems, and relates in particular to data storage system architectures that include disk arrays.
A conventional disk array system architecture is known as the RAID system architecture that includes a redundant array of independent/inexpensive disks (RAID). The RAID system architecture provides a large amount of data storage in a reasonably reliable manner. Several popular RAID system architectures are set forth in the paper entitled A Case for Redundant Arrays of Inexpensive Disks (RAID), Patterson et al., Proc. ACM SIGMOD International Conference on Management of Data, pp. 109-116, June 1988. These architectures include RAID-1, RAID-2, RAID-3, RAID-4 and RAID-5 system architectures. U.S. Pat. No. 5,526,482 briefly discusses each of these architectures.
In general, each of the RAID system architectures includes a plurality of disks that are controlled by a disk controller. When a central processing unit (CPU) sends information to the disk controller for storage on disk, the controller directs how the information shall be stored on the plurality of disks to ensure that a failure of any one of the disks will not cause the system to lose information. RAID-1 provides 2N data redundancy to protect data while RAID-3 through RAID-5 store data in parity stripes across multiple disks to improve space efficiency and performance over RAID-1. The parity of a stripe is the Exclusive-OR (XOR) of all data elements in the stripe. If a disk failed at time t0, and the system found such a failure at time t1, the data in the failed disk can be recovered by doing the XOR among the good disks, which may finish at t2. The recovered data is exactly the same image of the data as it was at time t0. Further conventional RAID architecture systems seek to recover data from failures of more than one disk.
With the rapid advances in networked information services coupled with the maturity of disk technology, data damage and data loss caused by human errors, software defects, virus attacks, power failures, or site failures have become more dominant, accounting for an estimated 60% to 80% of data losses. Current RAID architecture systems cannot protect data from these kinds of failures because damaged data are not confined to one or two disks. Traditional techniques protecting data from the above failures are mainly periodical (daily or weekly) backups and snapshots. These techniques usually take a significant amount of time to recover data. In addition, data between backups is vulnerable to data loss. Recent research has shown that data loss or data unavailability may cost up to millions of dollars per hour in many businesses. Solely depending on the traditional time-consuming backups is no longer adequate for today's information age.
Recovery of data is generally measured by two key parameters: recovery point objective (RPO) and recovery time objective (RTO). RPO measures the maximum acceptable age of data at the time of outage. For example, if an outage occurs at time t0, and the system found the outage at time t1, the ideal case is to recover data as it was right before t0, or as close to t0 as possible. A daily backup would represent RPO of approximately 24 hours because the worst-case scenario would be an outage during the backup, i.e., t0 is the time point when a backup is just started. RTO is the maximum acceptable length of time to resume normal data processing operations after an outage. RTO represents how long it takes to recover data. For the above example, if data is successfully recovered at time t2 after starting the recovery process at t1, then the RTO is t2−t1. Depending on the different values of RPO and RTO, there exist different storage architectures capable of recovering data upon an outage.
Data protection and recovery have traditionally been done using periodic backups and snapshots. Backups are typically done nightly when data storage is not being used since the process is time consuming and degrades application performance. During the backup process, user data is transferred to a tape, a virtual tape, or a disk for disk-to-disk backup. Full backups may be performed weekly or monthly with daily incremental backups occurring between the full backups.
Data compression is often used to reduce backup storage space. A snapshot is a point-in-time image of a collection of data allowing on-line backup. A full-copy snapshot creates a copy of the entire data as a read only snapshot storage clone. To save space, copy-on-write snapshot copies a data block from the primary storage to the snapshot storage upon the first write to the block after the snapshot was created. A snapshot may also redirect all writes to the snapshot storage after the snapshot was created. Such data back-up systems, however, remain costly and highly intrusive batch operations that are prone to error and consume an exorbitant amount of time and resources. As a result, the recovery-time-objective of backups is generally very long. Furthermore, data between two subsequent backups is vulnerable giving rise to high recovery-point-objective.
Besides periodic data backups, data may also be protected at file system level using file versioning that records a history of changes to files. Typically, users need to create versions manually in these systems. There are also copy-on-write versioning systems that have automatic versions for some file operations. File versioning provides a time-shifting file system that allows a system to recover to a previous version of files. These versioning file systems have controllable RTO and RPO, but they are generally file system dependent and may not be directly applicable to enterprise data centers that use different file systems and databases. File versioning differs from periodic backups and snapshots in that file versioning works mainly at file system level not at block device level. Block level storages usually provide high performance and efficiency especially for applications such as databases that access raw devices.
To provide timely recovery to any point-in-time at block device level, a log of changed data for each data block may be maintained in a time sequence. In the storage industry, this type of storage is usually referred to as CDP (Continuous Data Protection) storage. In such systems, a write operation will replace the old data in the same logic block address (LBA) to another disk storage instead of overwriting it. As a result, successive writes to the same LBA will generate a sequence of different versions of the block with associated timestamps indicating the time of the corresponding write operations. These replaced data blocks are stored in a log structure, maintaining a history of the data blocks that have been modified. Since every change on a block is kept, it is possible to view a storage volume as it existed at any point in time, dramatically reducing RPO. The RTO depends on the size of the storage for the logs, indexing structure, and consistency checks. The data image at the time of an outage is considered to be crash consistent at block level because the orders of all write operations are strictly preserved.
A significant drawback of the CDP storage is the large amount of storage space required, which has thus far prevented it from being widely adopted. Typically, about 20% of active storage volumes change per day, with an average of 5 to 10 overwrites to a block. Considering one terabyte data storage, a CDP storage will require one to two terabytes of space to store the logs reflecting data changes in one day. A week of such operations will require 5 to 10 terabytes of storage space.
There is a need, therefore, for an improved redundant data storage system, and in particular, for a system architecture for recovering data at time t2 to the data image of t0 after it is discovered at time t1 that data was damaged by human errors, software defects, virus attacks, power failures, or site failures.