The present invention relates to a method for inspecting data integrity in disk devices, and more particularly, to a method of inspecting data integrity in the built-in disk devices of a disk array apparatus by, while reducing the data traffic in the disk devices, conducting first-step arithmetic operations with the internal controllers of the disk devices and then conducting second-step arithmetic operations with the internal disk array controller of the disk array apparatus.
Recent storage subsystems employ, instead of conventional large-scale disks, the RAID system that was proposed by Patterson et al. (D. A. Patterson, G. A. Gibson, R. H. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID)”, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988, pp. 109-116). At RAID-3, RAID-4, and RAID-5 levels, multiple hard-disk drives (HDDs) are used and parity data is saved in one of the devices. For example, if four HDDs are used, data is saved in three of them and parity data is saved in the remaining device. Such an HDD arrangement is called 3D+1P.
Although the following description is given taking RAID-5 as an example, similar understanding is also possible for other RAID levels such as RAID-3, RAID-4, and RAID-6.
At RAID-5, data from a host computer is split according to a block size such as 4 KB or 8 KB, and sequentially written into multiple HDDs. Under a 3D+1P arrangement, after three sets of data, “Data1”, “Data2”, and “Data3”, have been written into three HDDs, “Parity” is written as parity data into the remaining device. “Parity” is generated by calculating the exclusive logical sum between “Data1”, “Data2”, “Data3”, and “Expected Value”, as in formula (1).
[Formula 1]Parity=Data1⊕Data2⊕Data 3⊕Expected Value   (1)
For odd parity, 1 is assigned to all bits in “Expected Value”. Even if arbitrary data is destructed, the remaining data, parity data, and “Expected Value” can be used to regenerate normal data from the destructed data. A formula for regenerating “Data3” from “Data1”, “Data2”, “Parity”, and “Expected Value”, is shown as formula (2). In this case, even if one of the HDDs which constitute RAID-5 fails, data can be regenerated from the data saved in the remaining HDDs. That is to say, RAID-5 is of the architecture that allows recovery from a single-device failure.
[Formula 2]Data3=Data1⊕Data2⊕Expected Value   (2)
The configuration of a conventional disk array apparatus is shown in FIG. 3. Reference numeral 300 in FIG. 3 denotes the conventional disk array apparatus, 301 a host computer interface, 302 an internal connection bus, 303 a memory controller, 304 a processor bus, 305 a microprocessor, and 306 an OR operational element. Reference numeral 307 denotes a memory bus, 308 a cache memory, 309 a disk device interface, 310 a disk device connection bus, 311 a disk device, 321 the host computer, and 322 a host computer connection bus.
An operational outline of the disk array apparatus is given below with reference to FIG. 3. The host computer 321 transmits a command and data to the disk array apparatus 300 via the host computer connection bus 322 formed of an element such as SCSI (Small Computer System Interface) or FC-AL (FibreChannel Arbitrated Loop). At the disk array apparatus 300, the command and data from the host computer 321 are received using the host computer interface (host interface) 301. Through the internal bus 302 such as a PCI bus, the host interface 301 saves the received command and data in the cache memory 308 via the memory controller 303 and the memory bus 307. The microprocessor 305 accesses the cache memory 307 through the processor bus 304 and the memory controller 303.
If the received command is a writing command, the microprocessor 305 generates parity data using not only the received data saved in the cache memory 308, but also the XOR operational element 306, and saves the parity data in the cache memory 308. The disk device interface (disk interface) 309 connected to the memory controller 303 by the internal bus 302 reads out the received data and the parity data from the cache memory 308, and writes both into multiple disk devices 311. The disk interface 309 and the disk devices 311 are connected to each other by the disk device connection bus 310 formed of an element such as SCSI or FC-AL.
If the received command is a readout command, the disk interface 309 reads out data from the disk devices 311 and stores the data into the cache memory 308. Next, the host interface 301 reads out the stored data from the cache memory 308 and transmits the data to the host computer 321.
The host interface 301, the memory controller 303, the microprocessor 305, the cache memory 308, the disk interface 309, and other elements form a disk array controller.
If a failure occurs in either of the disk devices 311, the disk interface 309 notifies the disk device failure to the microprocessor 305. The microprocessor 305 then displays the occurrence of the disk failure, at a console (not shown in FIG. 3) of the disk array apparatus 300, thus prompting an administrator of the disk array apparatus 300 to replace the abnormal disk device 311 with a normal disk device 311. After replacement with a normal disk device 311, the microprocessor 305 uses the disk interface 309 to read out data from the normal disk device 311. Next, the microprocessor 305 uses the XOR operational element 306 to generate data that is to be stored into the normal disk device 311, and uses the disk interface 309 to write the data into the normal disk device 311.
At RAID-5, since the number of sets of parity data is one, although the disk array apparatus can be recovered from a single-device failure in which one disk device suffers damage, the apparatus cannot be recovered from a dual-device failure in which two disk devices suffer damage. One of the biggest problems associated with RAID-5 failure recovery occurs if, while data is being read out from a disk device to regenerate data, another failure that has not been detected up to now is discovered and results in a dual-device failure occurring. Data loss thus results since recovery from a dual-device failure is impossible. This failure occurs, for example, if a write head for writing onto a storage medium, located within the disk device, is damaged and despite normal response to the disk array controller with a writing command, the write head fails to write data. If a read head for reading out data from the storage medium of the disk device is normal, since data readout can be executed properly, such a failure is very difficult to detect. To detect a failure in one disk device alone, there is no alternative but to conduct readout tests immediately after writing all data. However, this method is not realistic since it significantly deteriorates the disk device in performance. Accordingly, to verify data integrity in disk devices, there is a need to read out all data from all disk devices mounted in the disk array apparatus, and then perform arithmetic operations for the verification of data integrity.
A method of verifying data integrity is described below using FIG. 5 and formula (3). FIG. 5 is a schematic diagram of block addresses in RAID-5. Reference numeral 501 in FIG. 5 denotes block addresses of a first disk device; 502, block addresses of a second disk device; 503, block addresses of a third disk device; 504, block addresses of a fourth disk device; and 505, block addresses.
Formula (3) is a data integrity verification formula for RAID-5.
[Formula 3]XOR-abc=0-abc⊕1-abc⊕2-abc⊕3-abc   (3)
At RAID-5, data from the host computer (host) is split according to a block size such as 4 KB or 8 KB, and stored into multiple disk devices. Addresses of each block size are assigned from a beginning part of a sector address in each disk device. These addresses are called block addresses. Additionally, a disk identifier is also assigned to make it possible to uniquely specify to which disk device a particular block address group is assigned. For example, if the identifier of the disk device is N and the block address is “abc”, that block can be specified as N-abc to denote, for example, block address 505. The block addresses of each disk device range from “000” to “xyz”, as 501, 502, 503, or 504, for example. An identifier of the first disk device is defined as 0, an identifier of the second disk device, as 1, an identifier of the third disk device, as 2, and an identifier of the fourth disk device, as 3.
In order to verify integrity of the data written in, for example, block address “abc”, data is read out from the “abc” block addresses of the first to fourth disk devices and then XOR arithmetic operations are performed to calculate XOR-abc. If the data is recorded properly, XOR-abc becomes an expected value.
To verify data integrity in this way, data within the disk devices 311 must be read into the cache memory 308 by using the disk device connection bus 310, and arithmetic operations with the XOR operational element 306 must be conducted. This process consumes an interface band of each disk device 311 and a band of the disk device connection bus 310, and thus reduces the capability of the disk array apparatus 300 to process requests of the host computer 321. Similarly, a capacity of the cache memory 308 for processing requests of the host computer 321, and an available time of the XOR operational element 306 are reduced and this, in turn, reduces a processing capability of the disk array apparatus 300.
In order to avoid such reduction in the processing capability of a disk array apparatus, the “Auxiliary Storage Device Diagnosing Method, Information Processing Apparatus, and Storage Medium with Stored Procedure for Diagnosing Auxiliary Storage Section” described in Japanese Patent Laid-open No. 2002-149503 proposes a technique that uses the idle time of a processor to diagnose an auxiliary storage device.