In computers, hard disk devices (or hard disk drives (HDDs)) are used as large-capacity storage devices. Various types of data are stored in HDDs depending on applications, and high levels of reliability are required for the HDDs. Disk array apparatuses having an array of HDDs employ various reliability-enhancing technologies.
For example, as a technology for enhancing the reliability of HDDs, a RAID (Redundant Arrays of Inexpensive Disks) is available. The RAID is a technology for causing multiple HDDs to function collectively as a single hard disk, thereby increasing the speed in data input/output and improving the reliability. A disk array apparatus using RAID is called a “RAID device or apparatus”.
A variety of applied technologies are being studied as reliability-improving technology using RAID. For example, as technology for identifying a data error portion during disk failure, there is a technology in which the same access history information is stored during writing of data to multiple storage areas in different HDDs. The use of such access history information allows unmatched data portion to be identified during a failure of a HDD in RAID 6.
The use of RAID makes it possible to not only detect a failure of a HDD but also detect and correct an abnormality during access to individual data. Abnormalities during access to data in a HDD include an abnormality in which data changes in the apparatus, an abnormality in which data is allocated to a wrong location on a disk device, and an abnormality in which a response indicating that data is properly written is issued without the data being written (this abnormality may be referred to as a “write-related loss” hereinafter).
A RAID apparatus adds an ECC (error correcting code) to data and writes the resulting data, in order to deal with a data-access abnormality. The RAID apparatus then uses the ECC to detect an abnormality during reading. If written data changes, an error can be detected through ECC checking. Upon detecting an abnormality, the RAID apparatus uses parity data or the like to correct the data. With this arrangement, it is possible to detect and correct an abnormality in which data changes in the apparatus.
With respect to an abnormality in which data is written to a wrong location, the abnormality cannot be detected using the ECC. Thus, for example, the RAID apparatus adds address information to data and writes the resulting data. During reading of the data, the RAID apparatus checks the address added to the data. When the address of an area from which the data is to be read and the address added to the data are different from each other, it can be determined that the data is written to a wrong location. That is, it is possible to detect an abnormality in which data is written to a wrong location. Upon detecting such an abnormality, the RAID apparatus reconstructs and restores the original data by using parity data.
In this manner, when data changes in the RAID apparatus or when data is written to a wrong location, the RAID apparatus can detect the abnormality during reading of the data and can restore the original data.
With respect to a write-related loss, however, even the RAID apparatus cannot detect the abnormality during reading of the data. This is because, for a write-related loss, previously written correct data already exists in a block in question. Thus, even when the ECC is checked during reading of the data, it is determined that the data is properly written, and when the address is checked, it is also determined that the data is properly written. As a result, the ECC or address checking cannot detect an abnormality during reading of the data.
Accordingly, a known RAID apparatus reads data written immediately after writing of the data and compares the written data with the read data. The expression “immediately after data writing” used herein refers to time before a response indicating the completion of data write processing is sent to a host computer. Before the data write processing is completed, a controller in the RAID apparatus holds the data to be written and thus can compare the data with read data. Such data comparison before the data write processing is completed makes it possible to check whether or not the writing is properly executed.
This method, however, is based on the premise that writing/reading of data to/from HDDs is performed as part of the write processing, and thus cannot use write caching. Reading of data from a block immediately after writing of the data involves a wait time for the disk to make one rotation. Thus, the time for responding to the host computer during writing is extended. Consequently, the performance of the writing decreases.
With a RAID apparatus that manages data in RAID 3, it is possible to detect a write-related loss by simultaneously recording data and a timestamp to each physical block and comparing the timestamps during reading. In RAID 3, data in one logical block is distributed and stored across physical blocks in multiple HDDs. That is, in RAID 3, respective physical blocks in multiple HDDs constitute one logical block. Thus, when data is written to a logical block, the data is always written to multiple physical blocks at the same time. Accordingly, simultaneously with writing data to physical blocks constituting a logical block, the RAID apparatus writes the same timestamp to all the physical blocks. During reading of the data, the RAID apparatus simultaneously reads the timestamps from all physical blocks and compares the timestamps. When a physical block having a different timestamp exists, it can be determined that a write-related loss occurred.
The RAID apparatus based on RAID 5, however, has a problem in that it cannot detect a write-related loss during reading of data when the same method as for RAID 3 is used. That is, in RAID 5, the logical blocks and the physical blocks correspond to each other on a one-to-one basis and the individual physical blocks are independently updated. Thus, during writing of data to a physical block, there is only one physical block to which the data is to be written, and thus, processing for writing the same timestamp to multiple physical blocks cannot be performed, unlike RAID 3.
Such a problem is common to not only RAID apparatuses using RAID 5 but also disk array apparatuses that perform data write control such that data in one logical block is written to only one physical block.
In a certain aspect, an object of the present invention is to provide a technology that allows a write-related loss to be detected during data reading even when data in one physical block is updated during update of data in a logical block.