Field of the Invention
The invention relates to a method for asymmetrically scheduling buffer cache of a disk array.
Description of the Related Art
Reliability and availability are two important criterions in assessing quality of online storage service. A redundant array of independent disks (RAID) is one popular option for providing highly reliable and available data access service. Recently, various RAID specifications with higher fault-tolerant ability have been employed as they can recover all lost data even if two or more disks fail, and provide non-interrupted online service.
From a user's perspective, an average response time of a storage system, especially a storage system providing online service, is a key criterion in assessing service quality thereof. However, as a disk within tolerance of the disk array fails, to reconstruct data that are lost due to the disk's failure and to provide online service, read/write workflow generated by data reconstruction and workflow generated by a user request are to mutually affect each other, resulting in decline of the service quality. Many studies indicate that compared with a no-fault mode, an average response time of a user's read/write request under a degradation mode with recovery workflow may be significantly affected, and may be increased by several or several tens of times. Meanwhile, time consumption of online reconstruction may be increased greatly in comparison with below-the-line reconstruction. Therefore, to decrease the average response time of the user's request and reconstruction time are the most effective way for improving quality of service provided to a frontend user, as well as data availability.
Nowadays buffer caches are widely used, and many cache replacement algorithms are used for solving a problem of disk delay. However, conventional cache replacement algorithms only take a RAID system operating at the no-fault mode into account, and cannot be applied to a RAID system operating in a failure mode (namely some disks of a disk array fail). Under the failure mode, the disk array needs to read data of some survival disks (the number is determined by a configuration of the RAID system) for reconstruction, so as to recover data of a failed disk. Taking a RAID-6 as an example, if there is one failed disk out of n disks forming the disk array (n is an integer greater than or equal to 3), data of n−2 survival disks need to be read for reconstruction, and the remaining one disk does not need to participate in the reconstruction. In other words, large amount of reconstruction I/O flow may exist on the n−2 disks. However, conventional cache replacement algorithms do not consider this situation and divide a buffer area for each disk. Instead, these algorithms use global caches, enabling a buffer area of a reconstructed disk to have the same buffer capacity as that of a disk not involved in reconstruction, causing imbalance of I/O requests reaching the disk, and thus decreasing a reconstruction speed and degrading I/O performance of frontend applications.