The present invention relates generally to storage systems and, more particularly, to a distribution design for fast RAID rebuild architecture.
Currently, storage has a data distribution architecture. See, e.g., U.S. Pat. No. 7,904,749 (based on RAID5 and RAID6) (hereinafter U.S. Pat. No. 7,904,749) and U.S. Patent Application Publication No. 2007/0174671 (based on RAID1) (hereinafter US 2007/0174671). Those systems distribute data to a pool which consists of multiple disks. When rebuild occurs by disk failure, the rebuild process also distributes to the entire pool. Those systems can make the rebuild process run in parallel, thereby shortening the rebuild time. For a disclosure on rebuilding a storage system, see, e.g., U.S. Patent Application Publication No. 2008/0091741.
The storage reliability of a system can be calculated using the Markov model. For a traditional system based on RAID1/RAID5, the formula to calculate the availability/reliability is described in the paper entitled “Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability,” by Kevin M. Greenan, James S. Plank & Jay J. Wylie, The 2nd Workshop on Hot Topics in Storage and File Systems (HotStorage2010), Jun. 22, 2010, Boston Mass., USA. That formulas can be expanded for the distribution architecture such as those disclosed in U.S. Pat. No. 7,904,749 and US 2007/0174671.
FIG. 14 illustrates a method to solve the MTTDL (Mean Time to Data Loss) of the distribution architecture. The method involves the use of Model A, Formula B, and Definition C. Details of the method are found in the paper entitled “Notes on Reliability Models for Non-MDS Erasure Codes,” by James Lee Hafner & K. K. Rao, IBM Tech. Rep. RJ-10391, IBM, Oct. 24, 2006. Formula B is from Model A. Model A and Definition C mean the following. First, the state F0 changes to F1 at rate nλ. At F0, there is no disk failure. Each disk failure occurs at rate λ. There are n healthy disks. Second, the state F1 changes to F2 at rate (n−1)λ. At F1, there is one disk failure. Each disk failure occurs at rate λ. There are n−1 healthy disks. (One disk is broken; the disk cannot be broken more.) Third, the state F1 changes to F0 at rate p. p depends on the pace of the rebuild process. Hence, the process is limited by worst performance parts for rebuild. In general, processor(s), network, and disks work in the rebuild process. As such, p is described as p=min((n−1)μDisk,μProcessor,μNetwork). μj depends on the throughput performance of component j. The distribution architecture has multiple rebuild processes, so that the rebuild rate is proportional to the number of healthy disks n−1. Fourth, the state F2 cannot change to another status. Because F2 indicates two disk failure, it has data loss (cannot be rebuilt).
FIG. 15 is a plot of the availability of a storage device illustrating the relation of the number of disks versus the MTTDL. Result Y is calculated from Formula B and Condition D. Result X is calculated from Formula B and Condition D, but with μProcessor=μNetwork=∞. In this condition, at an environment which has more than dozens of disks, Result Y is lower than Result X. It means that the distribution makes the availability worse in the massive disk environment. Because the throughput performance of all disks for the rebuild process is limited by network or processor bandwidth, the effect by multiplication by distribution of disks is saturated.