1. Field of the Invention
The invention generally relates to non-volatile memory systems, and more particularly to computer disk arrays and object storage devices allowing space efficiency through data migration and data storage redundancy management.
2. Description of the Related Art
Mass storage systems generally organize their data either as block storage or object storage. Block storage systems store data as a fixed sequence of blocks, each block consisting of some fixed number of bytes of data. Each block can be addressed by its number in the sequence of blocks. Object storage systems store data as a variable number of objects, each object consisting of a variable number of bytes of data. Each object is addressed by an arbitrary object identifier.
The three primary design criteria for mass storage computer systems are cost, performance, and availability. It is most desirable to produce memory devices that have a low cost per megabit, a high input/output performance, and high data availability. “Data availability” is the ability to recover data stored in the storage system even though some of the data has become inaccessible due to failure or some other reason (i.e., deletion of data) and the ability to ensure continued operation in the event of such a failure. Usually, data availability is provided through use of redundancy management wherein data, or relationships among data, are stored in multiple locations. Specifically, data redundancy involves duplicating data into multiple storage devices.
Redundant storage systems consist of two or more storage devices such as disk drives and one or more controllers that manage the redundant data. Redundant block stores provide a virtual reliable disk using block disks. Redundant object stores provide a set of redundant “virtual objects.” Each redundant virtual object is stored using one object on each of two or more object storage devices.
Traditionally, there have been two common methods of storing redundant data. According to the first method or “mirror” method, data is duplicated and stored in two or more separate areas of the storage system. For example, in a disk array, the identical data is provided on two separate disks in the disk array. This method is also referred to as “RAID level 1”, for Redundant Array of Independent Disks. The mirror method has the advantages of high performance and high data reliability due to the duplex storing technique. However, the mirror method is also relatively expensive as the overhead effectively doubles the cost of storing the data. In other words, the overhead of mirrored storage is 50% when the system has two identical copies of the data, or more generally, 1/n when the system stores n copies.
In the second method or “parity” method, a portion of the storage area is used to store redundant data, but the size of the redundant storage area is less than the remaining storage space used to store the original data. For example, in a disk array having five disks, four disks might be used to store data with the fifth disk being dedicated to storing redundant data. This method of redundancy management includes RAID levels 2, 3, 4, 5, 53, and others. The parity method is advantageous because it is less costly than the mirror method. The overhead of the parity method is 1/(n+1) when the system stripes data over n storage devices, which translates into a lower cost system than the mirror method. However, the parity method has lower performance and availability characteristics in comparison to the mirror method. Related methods, such as RAID level 6, improve the availability by storing additional redundant data so that the system can withstand the failure of up to two disk drives. The extra copies result in greater overhead and greater cost than schemes that store only one redundant data copy.
Redundant object storage systems use variations on both the mirror and parity methods. In the mirror method for object storage, the system stores a virtual object by creating one physical object on each of two or more object storage devices, and storing identical copies of the virtual object data in each physical object. In the parity method for object storage, the system stores a virtual object by striping the virtual object's data across physical objects on multiple object storage devices, and storing a redundant copy of each stripe's data in one physical object on a different object storage device. For large virtual objects, the parity method is less costly than the mirror method. For small virtual objects, however, there may not be enough data to stripe across multiple physical objects efficiently, and so the cost of the parity method is no better than the cost of the mirror method.
Redundant object storage systems can also use a third storage method, the “grouped RAID” method, as shown in FIG. 1. In this method, one or more virtual objects are grouped together. Each virtual object is stored in one physical object, each on a different object storage device. In addition, a parity physical object stores redundant data for all the objects in the group. The parity object is stored on an object storage device separate from the object storage devices used for the other physical objects in the group. This method yields lower cost than the parity or mirror method when many small virtual objects can be combined into one group. Note that this is the subject of another patent application in progress.
In a grouped object RAID, the overhead depends on how much the sizes of the objects in the group differ. When all the virtual objects in the group are the same size, the overhead is 1/(n+1) for a group of n objects. However, when the virtual object lengths differ greatly, the storage overhead increases and can approach the 50% overhead of mirroring. FIG. 1 illustrates one-block objects (A and B) and one long object (C) grouped together. As shown, two of the objects (A and B) have a single block allocated, while the other object (C) is ten blocks long (C1 . . . C10). The parity object must be as long as the longest object (C), thus the parity object is also 10 blocks long (P1 . . . P10). The system thus stores 10 blocks in the parity object for 12 blocks of virtual object data. The overhead is therefore 10/(10+12) or just below 50%, which is slightly better than the mirror method.
However, because the overhead using the grouped object RAID method can vary widely, there remains a need for a data migration method that will ensure low overhead even as virtual objects change size.