1. Field of the Invention
The present invention relates to a storage system, a control method thereof, and a program for processing, via a cache memory, input/output requests from an upper-level device to storage devices of a RAID configuration; and particularly relates to a storage system, a control method thereof, and a program for writing back latest data which has been updated in a cache memory to storage devices which are forming a redundant configuration of RAID 5.
2. Description of the Related Arts
Conventionally, in a RAID device for processing input/output requests from a host, in the manner of FIG. 1, a cache memory 102 is provided in a control module 101 of a RAID device 100, and the input/output requests from a host to disk devices 104-1 to 104-4 which are constituting a RAID-5 group 105 are configured to be processed in the cache memory 102. Cache data of such RAID device 100 is managed in page units, and, in the manner of FIG. 2, a cache page 106 is managed such that, for example, 66,560 bytes serves as one page. The cache page 106 comprises user data in a plurality of block units serving as an access unit of a host, wherein one block of the user data is 512 bytes, 8-byte block check code (BCC) is added thereto at every 512 bytes, and a unit of 128 blocks of the 520-byte block is managed as one page, therefore, one page is 520×128=66,560 bytes. A cache management table called a cache bundle element (CBE) is prepared for managing the cache pages 106. In the cache management table, a management record corresponding to every one page is present, and the management record retains, for example, a logical unit number LUN, a logical block address LBA, and a dirty data bitmap of dirty data in which one block is represented by one bit. One page of the cache management table has the same size as the size of a strip area of each of the disk devices constituting a RAID group. Herein, when RAID 5 is used as the redundant configuration of the RAID device 100, a cache area 108 for storing cache data is provided in the cache memory 102, and, separate from the cache area 108, a data buffer area 110 for storing old data and old parity and a parity buffer area 112 for storing new parity are provided as work areas for generating new parity in a write-back process. In a write-back process, for example, if a request for writing back new data (D2) new which is present as one-page data in the cache area 108 to the disk device 104-2 is generated, the write-back process is carried on after the data buffer area 110 and the parity buffer area 112 are reserved in the cache memory 102. Herein, since the new data (D2) is written to one of the disk devices, this write-back process is called small write. In the small write, old data (D2) old is read out from the disk device 104-2 and stored in the data buffer area 110, and old parity (P) old is read out from the disk device 104-4 and stored in the data buffer area 110 as well. Subsequently, an exclusive OR (XOR) 116 of the new data (D2) new, the old data (D2) old, and the old parity (P) old is calculated, thereby obtaining new parity (P) new, and it is stored in the parity buffer area 112. Lastly, the new data (D2) new and the new parity (P) new is written to the disk devices 104-2 and 104-4, respectively, and the process is terminated. The write back in a case in which new data is present in the manner corresponding to all of the disk devices 102-1 to 102-3 (entire data of the stripe area) is called band-wide write; and in the band-wide write, new parity is calculated as the exclusive OR of all the data corresponding to the strip areas of the disk devices 104-1 to 104-3, and write to the disk devices 104-1 to 104-4 is performed so as to terminate the process. In either case, the data buffer area 110 and the parity buffer area 112 are released when write is completed. Incidentally, in such RAID device forming the redundant configuration of RAID 5, for example, as shown in FIG. 3, if errors occur in, for example, two disk devices 104-3 and 104-4 in a write-back process, there generated failure in which the consistency of data according to RAID 5 is broken in the stripe area of the disk devices 104-1 to 104-4. That is, as a result of the failure of the write-back process, merely the disk device 104-2 has the new data (D2) new, and the disk devices 104-3 and 104-4 remain to have the old data (D3) old and the old parity (P) old. In the state in which the consistency is broken in the above described manner, if the disk device 104-1 further fails in the manner of FIG. 4, thereby degenerating the RAID-5 group and resulting in a three-device configuration, since a request from a host 118 for reading the data D2 results in miss-hit in the cache memory 102, staging from the disk device 104-1 is attempted. However, since it is in a degenerated state in which the disk device 104-1 has failed and eliminated from the RAID-5 group 105, Regeneration Read for recovering the data D1 through exclusive OR of the data of the normal disk devices 104-2 to 104-4 and parity will be executed. However, since the consistency of the data of the disk devices 104-2 to 104-4 and the parity has been broken, garbled data (D?) may be restored through the exclusive OR. Incidentally, in the RAID device, control modules provided with cache memories are duplexed for ensuring reliability, and the parity generated in a write-back process is saved in a control module in the backup side. Therefore, when, like FIG. 4, a disk device fails, thereby causing degeneration in a state in which the consistency of the disk devices has been broken due to an error in a write-back process, in the manner of FIG. 5, a control module 101-1 obtains new parity (P) new which is saved in a control module 101-2 in the backup side, and performs write correction (Write Correct) for writing the new data (D2) new and D(3) new in the cache memory 102 to the corresponding disk devices 104-2 to 104-3, so as to recover the consistency of the disk devices 104-1 to 104-4.
[Patent Document 1] Japanese Patent Application Laid-Open (kokai) No. H05-303528
[Patent Document 2] Japanese Patent Application Laid-Open (kokai) No. H08-115169
However, in such conventional cache control process, if there occurs degeneration in which failure occurs in the control module and the process is transferred to the backup control module, parity cannot be saved; therefore, if the consistency of the disk devices of the RAID group is broken, recovery of the consistency by means of write correction using parity such as that of FIG. 5 cannot be performed. Thereat, in the manner of FIG. 6, the control module 101-2, which has taken over the process as a result of the error of the control module 101-1, cannot obtain parity in the cache memory 102; therefore, parity (P) is recalculated (Recalculate) through exclusive OR of the data in the disk devices 104-1 to 104-3 and written to the disk device 104-3, thereby recovering the consistency of the disk devices 104-1 to 104-4, and then, in the manner of FIG. 7, after new parity (P) new is calculated from the new data (D2) new and (D3) new remaining in the cache memory 102, normal write back for performing write on the disk devices 104-2 to 104-4 is performed. However, if, in a state in which the control modules are degenerated like FIG. 7, the disk device 104-1 further fails and the RAID-5 group 105 is degenerated, recalculation of parity cannot be performed in the degenerated RAID-5 group 105 unlike FIG. 6; therefore, the consistency cannot be recovered, and, unless access to the entire stripe of the RAID-5 group is prohibited, risks of data garbling are generated, which is problematic. According to the present invention, there are provide a storage system, a control method thereof, and a program for recovering an error occurred in access in a case in which the consistency of the RAID-5 group has been broken and degeneration of the storage devices has occurred, thereby improving reliability of data with respect to accesses.