The invention relates generally to redundant storage techniques for use with data storage systems that include an array of digital data storage disks.
RAID is an acronym for Redundant Array of Individual Disks. It is a technology that has been developed to provide data redundancy to protect against disk failures in the storage system. RAID 1, which provides the simplest form of redundancy, involves simply mirroring the normal data to another disk, so that the same data is stored on two different disks. Thus, if the main storage disk fails, the controller in the storage system need only direct its accesses to the second disk where all of the same data is stored. In other words, a failed disk will not result in the loss of data to the system. Of course, this form of redundancy is expensive because it requires twice as much storage space, every disk has to have its mirror.
To reduce the amount of storage required to support the data redundancy, other forms of RAID technology have developed. The price typically paid by these other techniques for lower storage requirements is less redundancy. For example, there is RAID 4 according to which the system generates parity information by XOR""ing the data on two or more other disks and then storing the result on another drive. So, for example, assume that the parity is generated from data stored in three disks. In that case, the corresponding set of data on each of the three disks is XOR""ed together to produce a parity block. The parity information generated in that way is then stored on a fourth disk. Thus, if one of the first three disks fails, the data that was stored on that disk can be generated from the data that is stored on the parity disk and the two other disks.
There are modifications of the last technique described above. For example, the parity can be bit-interleaved parity which is stored on a dedicated disk, as in the case of RAID 3, or it can be block-interleaved parity stored on a dedicated disk, as in the case of RAID 4. Alternatively, it can be block-interleaved parity that is spread (stripped) across multiple disks so that each disk in the system contains both normal data and parity data. In that case, however, the parity data is still stored on a different disk from the disks which contain the data from which the parity information was generated.
The present invention relates to a new RAID technique which is referred to herein as RAID C or RAID Compressed. This new type of RAID generates a compressed image of a data set and uses that as the parity information. In other words, instead of performing an XOR operation on the image set of data, as is done for RAID 3 or RAID 4, a compression algorithm is applied to the image set of data to produce the parity information that is stored on a separate disk from where the image set of data is stored.
In general, in one aspect, the invention is a method of storing data in a digital data storage system that includes a plurality of disk drives. The method includes the steps of receiving data at the data storage system; storing at least a portion of the received data on a first set of disk drives among the plurality of disk drives; compressing the portion of received data; and storing the compressed data on a parity disk drive so that the parity drive stores data that is redundant of data stored in the first set of drives.
In preferred embodiments, the step of storing the portion of received data involves storing that data without first compressing it. The parity drive is different from the first set of drives. The method also includes the step of assigning the parity drive to be a dedicated drive for storing parity information. The step of receiving data involves storing the data in a cache memory and the method further includes the step of destaging that data from the cache memory to the plurality of drives, wherein the step of destaging involves the first and second mentioned storing steps and the compressing step. The step of storing at least a portion of the received data on the first set of drives involves first compressing that data and then storing it on the first set of drives.
In general, in another aspect, the invention is a method of storing data in a digital data storage system that includes a plurality of disk drives. The method includes the steps of receiving N blocks of data; storing the N blocks of data on a first set of disk drives among the plurality of disk drives, wherein each block of the N blocks is stored on a different disk drive; and compressing the N blocks of data; and storing the compressed data on a parity disk drive so that the data storage system simultaneously stores the N data blocks in uncompressed form and the compressed data in different places.
In general, in still another aspect, the invention is an apparatus for storing data including a plurality of disk drives; a cache memory; a compression engine; and a controller which destages data stored from cache memory into the plurality of disk drives. The controller is programmed to perform the functions of: collecting a plurality of data blocks stored in the cache memory, causing the compression engine to compress the data in the aggregation of data blocks, storing the compressed data in a parity drive said parity drive being one of the plurality of disk drives, and storing each of the plurality of data blocks in a different one of the plurality of disk drives none of which is the parity drive.
Among other advantages, the invention provides an alternative type of RAID that enables one to reconstruct lost data without involving any drives other than the parity drive. An implication of this is that even if all normal drives fail, the data can be recovered from the parity drive. Also, the invention provides a way of substantially reducing the time required to perform backup and/or to transmit stored data to another system since the smaller amount of compressed data on the parity drive can be sent rather than the corresponding larger amount of uncompressed data on the normal drives.
Other advantages and features will become apparent from the following description of the preferred embodiment and from the claims.