Data is the primary asset of most corporations in the information age, and businesses must be able to access that data to continue operation. In a 2001 survey, a quarter of the respondents estimated their outage costs as more than $250,000 per hour, and 8% estimated them as more than $1M per hour. The price of data loss is even higher. It has been estimated that two out of five enterprises that experience a site disaster go out of business within five years. Dependable data storage systems are needed to avoid such problems.
Fortunately, many techniques exist for protecting data, including tape backup, mirroring and parity-based RAID schemes for disk arrays, wide area inter-array mirroring, snapshots, and wide area erasure-coding schemes. New techniques continue to be invented and older techniques become more widely used as the cost of storage capacity drops. Each technique protects against a subset of the possible failure scenarios and techniques are often used in combination to provide greater coverage.
Disk arrays are typically used to store a primary copy of data. Disk arrays often employ internal protection against hardware failure through RAID techniques and redundant hardware paths to the data. Other failures, such as user errors, software errors, or hardware failures employ techniques that periodically make secondary copies of the data. The secondary copies preferably reflect a consistent version of the primary copy at some instant in time. The main classes of such techniques are mirroring, point-in-time copies, and backup.
Inter-array mirroring keeps a separate, isolated copy of the current data on another disk array, which may be co-located with the primary array or remote. Inter-array mirrors may be synchronous, where each update to the primary is also applied to the secondary before write completion, or asynchronous, where updates are propagated in the background. Batched asynchronous mirrors coalesce overwrites and send batches to the secondary to be applied atomically (i.e., once a write of a batch begins, it completes without interruption). Batched asynchronous mirrors lower the peak bandwidth needed between the copies by reducing the number of updates propagated and smoothing out update bursts.
A point-in-time (PiT) image is a consistent version of the data at a single point in time, typically on the same array. The PiT image may be formed as a split mirror, where a normal mirror is maintained until the “split” operation, which stops further updates to the mirror, or as a virtual snapshot, where a virtual copy is maintained using copy-on-write techniques, with unmodified data sharing the same physical storage as the primary copy. Most enterprise-class disk arrays provide support for one or more of these techniques.
Backup is the process of making secondary copies on separate hardware, which could be another disk array, a tape library or an optical storage device. Backups may be full backups; cumulative incremental backups, where all changes since the last full backup are copied; or differential incremental backups, where only the portions changed since the last full or cumulative incremental are copied. Tape backup is typically done using some combination of these alternatives (e.g., weekend full backups, followed by a cumulative incremental every weekday). Backups made to physically removable media, such as tape or optical disks, may also be periodically moved to an off-site vault for archival storage.
Backup techniques and tools have been studied from an operational perspective. There are also a number of studies describing alternative mechanisms for archival and backup and file systems that incorporate snapshots. Evaluations of the dependability of storage systems have focused mainly on disk arrays.
Unfortunately, the multitude of data protection techniques combined with their configuration parameters often means that it is difficult to employ each technique appropriately. System administrators often use ad hoc techniques for designing their data storage systems, focusing more on setting configuration parameters (e.g., backup windows), rather than on trying to achieve a particular dependability. As a result, it is often unclear what dependability a given storage system design provides, whether the business' dependability goals have been met, or whether the system costs too much.