Various forms of network storage systems are known today. These forms include network attached storage (NAS), storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc.
A network storage system can include at least one storage system, which is a processing system configured to store and retrieve data on behalf of one or more storage client processing systems (“clients”). In the context of NAS, a storage system operates on behalf of one or more clients to store and manage shared data containers in a set of mass storage devices, such as magnetic or optical disks or tapes, or flash drives. The data containers may include files, LUNs, or other units of storage. The mass storage devices may be organized into one or more volumes of a Redundant Array of Inexpensive Disks (RAID). In a SAN context, the storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access.
RAID configurations are typically used to organize an array of mass storage devices, such as hard disk drives (HDDs), which serve as the primary data storage for a storage system. A RAID group may be configured using various fault-tolerance levels, such as for example, RAID-0, RAID-1, RAID-4, RAID-5 or RAID-DP™ depending on the performance and reliability characteristics of the system. Each of these RAID levels has a set fault-tolerance level (i.e., a number of failures that the RAID group can successfully recover from). As a result, the availability and resiliency of the storage system is very closely related to the RAID protection level utilized. For example, in RAID-1, the contents of a storage device are mirrored at another storage device. Since only half of the available space can be used for data, a RAID-1 protection configuration is typically very expensive to employ.
In the primary data storage it is necessary to maintain the integrity of the data. Thus, in the event of one or more errors, such as the failure of a physical disk, the failure of an individual data block, a checksum error, or other error, a recovery process enabled by the RAID level may be performed. The recovery process consumes significant amounts of time and system resources and prevents input/output operations from being performed on the primary data storage until the recovery process is complete. In addition, the recovery process is only possible if the number of failed disks or disk errors does not exceed the fault-tolerance level of the RAID group. If the number of failed disks or disk errors exceeds the fault tolerance level of the RAID group, the RAID group may stop operation and a system panic may be initiated.
The properties of RAID technology may be advantageous when used for secondary data storage, such as a cache. However, certain characteristics of RAID may be overly restrictive. For example, if an unrecoverable error condition exists, the storage system may take drastic recovery actions, such as a file system consistency check, to attempt to recover the data. However, a RAID array used as a cache could survive the failure of any number of storage devices since a copy of the cache contents already exists in the primary data storage. In addition, it may not be necessary to actively recover or reconstruct the contents of the cache, thus preventing downtime.