1. Field of the Invention
The disclosed invention generally relates to data storage methodologies, and, more particularly, to an object-based methodology in which a data file is migrated from RAID-1 to a non-mirrored RAID scheme employing an XOR-based error correcting code without rewriting the data contained in the data file.
2. Description of Related Art
With increasing reliance on electronic means of data communication, different models to efficiently and economically store a large amount of data have been proposed. A data storage mechanism requires not only a sufficient amount of physical disk space to store data, but various levels of fault tolerance or redundancy (depending on how critical the data is) to preserve data integrity in the event of one or more disk failures. One group of schemes for fault tolerant data storage includes the well-known RAID (Redundant Array of Independent Disks) levels or configurations. A number of RAID levels (e.g., RAID-0, RAID-1, RAID-3, RAID-4, RAID-5, etc.) are designed to provide fault tolerance and redundancy for different data storage applications. A data file in a RAID environment may be stored in any one of the RAID configurations depending on how critical the content of the data file is vis-à-vis how much physical disk space is affordable to provide redundancy or backup in the event of a disk failure.
FIGS. 1 and 2 illustrate traditional RAID-1 and RAID-5 storage arrangements respectively. As is known in the art, RAID-1 employs “mirroring” of data to provide fault tolerance and redundancy. FIG. 1 shows an exemplary mirroring arrangement wherein four disks 10, 18, 26 and 34 are used to provide mirroring. The contents of disk 10 are mirrored onto disk 18, and the contents of disk 26 are mirrored onto disk 34. The data on each physical disk is typically stored in “blocks”, which contain a number of disk sectors to store the incoming data. In other words, the total physical disk space is divided into “blocks” and “sectors” to store data. FIG. 1 shows the contents of blocks 12, 14 and 16 of the primary disk 10 mirrored onto blocks 20, 22 and 24 respectively of the secondary or mirror disk 18. Similarly, the contents of blocks 28, 30 and 32 of the primary disk 26 are shown mirrored onto blocks 36, 38 and 40 respectively of the mirror disk 34. Each block may be of the same, predetermined size (e.g., 8 KB).
As is known in the art, the storage mechanism provided by RAID-1 is not the most economical or most efficient way of providing fault tolerance. Although RAID-1 storage systems are simple to design and provide 100% redundancy (and, hence, increased reliability) during disk failures, RAID-1 systems substantially increase the storage overhead because of the necessity to mirror everything. The redundancy under RAID-1 typically exists at every level of the system—from power supplies to disk drives to cables and storage controllers—to achieve full mirroring and steady availability of data during disk failures.
On the other hand, RAID-5 allows for reduced overhead and higher efficiency, albeit at the expense of increased complexity in the storage controller design and time-consuming data rebuilds when a disk failure occurs. FIG. 2 illustrates an exemplary data storage arrangement showing data stored among five disks 50, 54, 58, 62 and 66 in RAID-5 configuration. RAID-5 uses the concepts of “parity” and “striping” to provide redundancy and fault tolerance. Simply speaking, “parity” can be thought of as a binary checksum or a single bit of information that tells the operator if all the other corresponding data bits are correct. RAID-5 creates blocks of parity, where each bit in a parity block corresponds to the parity of the corresponding data bits in other associated blocks. The parity data is used to reconstruct blocks of data read from a failed disk drive. Furthermore, RAID-5 uses the concept of “striping”, which means that two or more disks store and retrieve data in parallel, thereby accelerating performance of data read and write operations. To achieve striping, the data is stored in different blocks on different drives. A single group of blocks and their corresponding parity block may constitute a single “stripe” within the RAID set. In RAID-5 configuration, the parity blocks are distributed throughout all the disk drives, instead of storing all the parity blocks on a single disk. Algorithms for deciding where a particular stripe's parity block resides within the array of disk drives are known in the art.
FIG. 2 illustrates a RAID-5 data storage through striping. For the sake of simplicity and ease of illustration, all blocks in a single disk are referred to by the same numeral. Thus, each block on disk 50 is designated by the same reference numeral “52”, each block on disk 54 is designated by the same reference numeral “56”, and so on. As shown in FIG. 2, the “0 stripe” includes all “zero” data blocks (i.c., data blocks A0, B0, C0 and D0) and the corresponding “zero” parity block 68 (on disk 66). Similarly, the data blocks A1, B1, C1 and E1 and their corresponding parity block 64 (i.e., the “1 parity” block on disk 62) constitute the “1 stripe”. The data blocks and corresponding parity blocks for the “2 stripe”, “3 stripe” and “4 stripe” are also shown. As can be seen from FIG. 2, each parity block is stored on a different disk, thereby distributing or “staggering” the parity storage throughout all the disks. The staggering of parity blocks ensures that I/O (input/output) operations needed to read or write parity blocks are distributed throughout the RAID disk set. The parity generation function is symbolically indicated by block 70 in FIG. 2. However, the design of RAID-5 disk controllers implementing parity and striping is quite involved and complex when compared with a relatively simpler disk controller for achieving mirroring under RAID-1.
As noted earlier, RAID-1 implements fault tolerance at the expense of increased overhead (i.e., doubling of storage space). On the other hand, RAID-5 reduces storage space overhead by using the concepts of parity and striping as discussed hereinabove. Furthermore, RAID-1 is more “write-efficient” (i.e., less write latency) than RAID-5 in the sense that a data write operation involves fewer I/O operations under RAID-1 than under RAID-5. For example, when the existing data in a sector on a disk block is to be replaced with new data, a RAID-1 controller may need to perform two I/O operations to write the new data on the disk sector as opposed to four I/O operations needed by a RAID-5 controller. To explain further, the RAID-1 configuration will require the following two I/O operations: (1) Write the new data in the appropriate sector on the block on the primary disk, and (2) also write the new data in the appropriate sector on the corresponding block on the mirrored disk. On the other hand, the RAID-5 configuration will require the following four I/O operations: (1) Read the data from appropriate sector on each disk associated with the stripe for the data to be replaced, (2) compute the new parity using the new data and the data from each disk in the stripe obtained in step (1), (3) write the new data in place of the old data on the appropriate disk sector, and (4) write the newly-computed parity in the appropriate data sector on the corresponding disk storing parity information.
Thus, as can be seen from the foregoing, when storage space overhead is not too critical (i.e., when storing a smaller size (e.g., 32 KB) data file), it is preferable to store the data file as a RAID-1 file to reduce write latency inherent in a RAID-5 storage. On the other hand, when the RAID-1 file grows to a larger size or when another large data file is to be stored (e.g., a file size of more than 32 KB), it becomes desirable and more economical to store the grown data file or the new data file in a RAID-5 configuration to substantially reduce the storage overhead inherent in a RAID-1 storage configuration. Thus, a combination of RAID-1 storage for smaller data files and RAID-5 storage for larger data files allows better write performance of RAID-1, while still keeping the total fraction of all capacity consumed by redundancy at a low level.
To illustrate the foregoing concept of selecting different RAID configurations for different file sizes, it is noted that trace studies have shown that in a typical file system a large majority of files are small in size (i.e., in the range of 10 KB in size), whereas the large majority of total storage capacity is typically consumed by a few large files (of 10-100 MB or more). For example, in a file system containing 100 files with 95 files of 1 KB size and 5 remaining files of 50 MB each, the following storage capacity may be required when RAID-1 configuration is used to store 95 small files and 10+1 RAID-5 configuration (i.e., 10 disks for data and 1 disk for parity) is used to store 5 large files.Bytes storing user data=(95×1 KB)+(5×50 MB)=250.095 MB                              Bytes          ⁢                                           ⁢          storing          ⁢                                           ⁢          redundant          ⁢                                           ⁢          data                =                              (                          95              ×              1              ⁢                                                           ⁢              KB                        )                    +                      (                          5              ×                              1                10                            ×              50              ⁢                                                           ⁢              MB                        )                                                  =                  25.095          ⁢                                           ⁢          MB                     Total bytes stored in the file system=250.095+25.095=275.19 MB                              Fraction          ⁢                                           ⁢          of          ⁢                                           ⁢          storage          ⁢                                           ⁢          consumed          ⁢                                           ⁢          by          ⁢                                           ⁢          redundant          ⁢                                           ⁢          information                =                  25.095          275.19                                        =                  9.12          ⁢                                           ⁢          %                    Thus, assuming that all files in those 100 files are written equally often, the storage layout scheme with RAID-1 for small files and RAID-5 for large files allows efficient RAID-1 writes for around 95% of all write accesses, but still keeps total capacity overhead for redundancy at under 10%.
Although the RAID-1/RAID-5 combination helps in achieving write-efficient storage with a reasonable storage overhead, there is a performance penalty to be paid in the prior art storage methodology when a file that is initially small (i.e., stored as a RAID-1 file) grows into a large file necessitating a migration from RAID-1 storage to RAID-5 storage. In that event, the most recent version of the entire file has to be copied from one or more RAID-1 disk blocks/drives to a group of RAID-5 disk blocks/drives. Additionally, the necessary parity block(s) are also required to be computed and stored in appropriate block(s). Such data copying and parity generation for each file growing beyond a certain size may not prove efficient when a large number of files are to be migrated from RAID-1 to RAID-5 configuration. Therefore, it is desirable to devise a storage methodology where RAID-1 to RAID-5 migration takes place without additional file copying operations.
Furthermore, existing data storage systems do not concurrently or adequately address the issues of dynamic load balancing, hierarchical storage management, data backup, fault tolerance, and performance optimization. Management of all those functions separately creates the need for a substantial amount of management and the danger of one function conflicting with another puts the integrity of the data stored at risk. For example, in the RAID-5 data storage configuration shown in FIG. 2, a first process running on the computer network attached to the RAID-5 disks 50, 54, 58, 62, 66 may access a sector on block A1 on disk 50 and write data into it. The “1 parity” block will then be updated in view of this newly written data. The first process may not write anything thereafter to the data file in “1 stripe.” However, a second process (which is unrelated to the first process) may then write different data on a sector on block C1, thereby changing the file content as well as the parity information. This may produce an error message when the first process next accesses the data file in “1 stripe” and notices the different parity information. The second process itself may have accessed the data file in error. But, because of sector- and block-based accesses allowed in traditional RAID-5 storage systems, such data and parity overwriting/corruption may not be easily avoidable because each network process accessing a data file may be capable of accessing only a part of that data file, which part may not have any relation to another part accessed by another process. Therefore, it is desirable to devise a data storage mechanism that preserves data integrity when a file is being shared among a number of processes.
Thus, there is a need for a storage system that concurrently and adequately provides for dynamic load balancing, hierarchical storage management, data backup, fault tolerance, and performance optimization.