1. Field of the Invention
The present invention relates, in general, to a method of improving the Input/Output (I/O) performance of a Redundant Array of Independent Disks (RAID) system. More particularly, the present invention relates to a method of improving the I/O performance of a RAID system using a Matrix Stripe Cache (MSC); which performs read connections and write connections for each column of the MSC in order to reduce the number of I/O operations, thereby improving the performance of fragmented writes; and which improves read performance of a normal mode at the expense of read performance of a degraded mode.
2. Description of the Related Art
A RAID system integrates several independent disk storage devices into one, distributes and stores data across respective disks, and enables simultaneous access to multiple disks, thereby improving I/O characteristics. Furthermore, a RAID system allows a plurality of independent disk storage devices to be viewed as a single disk by a host computer system, thereby implementing a high-capacity storage device.
Furthermore, a RAID system accommodates auxiliary data such as disk copy or Error Checking and Correction (ECC) code or parity data, so that data can be recovered automatically even if any one disk of the RAID system fails, thereby increasing the reliability of the system.
FIG. 1 is a diagram showing the schematic construction of a general RAID system. The RAID system includes a host system 10 considered to be a system main body, a plurality of hard disks 30 connected to the host system 10, and a RAID controller 20 connected between the host system 10 and the hard disks 30 and configured to manage the hard disks 30. The RAID controller 20 includes memory 21 for compensating for the difference in speed between the host system 10 and the hard disks 30.
Furthermore, RAID systems are classified into five levels according to configuration. According to the article written by Patterson et al., RAID systems are classified into five types, that is, RAID level 1 through RAID level 5.
RAID level 1 is a technique of storing data in N disks, storing data in N other disks, and copying and storing data to other mirror disks. At the time of writing data, the same data must always be stored into two different disks. At the time of reading the data, one of the two disks, which has the faster access time, may be selected and the data may be read from the selected disk. If one of the two disks fails, service can be continuously provided using the mirror disk.
RAID level 2 is a technique of protecting data using Hamming code, and incurs less disk cost than the mirroring scheme of RAID 1.
RAID level 3 is a technique in which one parity disk is added to a group of N data disks. At this level, at the time of writing data, data is distributed across and stored in the N respective disks in bits or bytes, and parity data obtained by performing an XOR operation on data stored in each data disk is stored in the parity disk. At the time of reading data, the N disks must be accessed at the same time. If one of the N disks fails, information can be recovered using the parity data stored in the parity disk.
RAID level 4 includes N+1 disks, where data is stored in N disks and parity is stored in the other disk, RAID level 4 is fairly similar to RAID level 3 but it is different from RAID level 3 in that data is distributed and stored in terms of block. Therefore, at the time of writing data, access to one data disk and the parity disk is required, and at the time of reading data, only one disk is accessed. Furthermore, when one disk fails, information can be recovered using the parity blocks stored in the parity disk.
RAID level 5 is similar to RAID level 4 in that data is stored in terms of block but it is different from RAID level 4 in that parity data is distributed across disks instead of storing the parity data in a fixed disk. At this level, methods used when data is read and written and a data recovery method used when one disk fails are the same as those of RAID level 4.
In addition, there are RAID level 0, which simply distributes and accommodates data without using auxiliary data, and RAID level 6, which has a P+Q error recovery method using Reed-Solomon code. RAID level 6 exhibits higher reliability than methods using parity because information can be recovered even if two disks fail at the same time. Currently, most RAID systems support RAID levels 0, 1, 3 and 5, and RAID level 0/1, that is, a combination of RAID levels 0 and 1. A RAID level suitable for the user application environment is selected and used.
Meanwhile, the RAID levels accompanied by parity, such as RAID level 5, exhibit very poor performance for small write operations. To solve this problem, destage algorithms have been developed.
The term “destage” refers to a method of copying write data to a cache instead of immediately transferring the data to a disk when a write is requested, and then the data is transferred to the disk later. The destage algorithms include Wise Ordering for Write (WOW), High Low Water Mark (HLWM), Linear Threshold (LT) Scheduling, etc.
Furthermore, cache replacement algorithms for reducing the number of disk I/O operations in order to improve reading performance include ARC, 2Q, MQ, LRU-2, ALRFU, LIRS, MRU, LRU, etc. Disk scheduling algorithms such as CSCAN, SSTF, and Anticipatory disk scheduling reduce seek and rotational latency of the disk.
Furthermore, patents related to a conventional RAID system include U.S. Pat. No. 6,704,837. in which, in order to improve the write performance of a RAID system, the size of write operations and the track size of a disk are compared with each other at the time of destage.
Furthermore, “A Case for Redundant Arrays of Inexpensive Disks (RAID)” (ACM SIGMOD. 1988, D. A. Patterson et al.) proposes a RAID structure, but does not present a method of improving the I/O performance of a RAID system. “WOW: Wise Ordering for Writes—Combining Spatial and Temporal Locality in Non-volatile Caches” (USENIX FAST 2005, Binny S. Gill et al.) discloses only one of the destage algorithms for improving the write performance of a RAID system. “Scheduling algorithms for modern disk drives” (SIGMETRICS, D. L. Worthington) introduces a variety of disk scheduling techniques for improving the read and write performance of a disk.
Furthermore, in a RAID system, writing data is generally performed sequentially. However, if a file system is highly fragmented (that is, one file is not consecutively stored in a disk, that is, as one unit, but is divided into several fragments and discontinuously stored in a disk), writing data is viewed as a fragmented sequential write from the point of view of a disk. That is, the pattern of write exhibits a stride pattern, that is, a jump pattern, in which case very poor performance results. Furthermore, in the RAID system, a RAID controller does not take advantage of the maximum performance of disks due to a complex implementation in which a disk failure-tolerant function must be supported. Furthermore, the RAID system can read data without loss when a disk fails (in a degraded mode). However, the RAID system entails a lot of overhead during a read operation due to the complexity of the implementation thereof, therefore reading performance is degraded both in a normal mode and in a degraded mode.