1. Field of the Invention
The present invention relates to a technology that serves as a countermeasure for a peculiar write failure of a magnetic disk apparatus that occurs posteriorly (e.g., post-shipping), and more particularly to a technology that serves as a countermeasure for a failure in which data cannot be written on magnetic recording media of a magnetic disk apparatus and the magnetic disk apparatus itself is unable to detect that the data could not be written.
2. Related Background Art
In one magnetic disk write/read diagnosis method, whether a magnetic disk apparatus is operating normally is diagnosed and verified by writing data on the magnetic disk apparatus and reading written data to compare it against original data.
Also, a RAID apparatus is known as an external memory apparatus that can significantly enhance the adaptability of the apparatus as a whole instead of the reliability of individual magnetic disk apparatuses by its redundant structure that combines a plurality of magnetic disk apparatuses (“A Case for Redundant Arrays of Inexpensive Disks (RAID)”, Patterson, et al., Proc. ACM SIGMOD, June 1988).
Magnetic disk apparatuses that achieve high recording density by using a composite magnetic head with a dedicated magnetic head for recording and another for reproduction are the mainstream. Conventionally, a single inductive head was used both for data recording and reproduction, which allowed an early discovery of any abnormality during reproduction. A composite magnetic head also allows an early discovery of abnormality with the reproduction head, but has a difficulty in finding abnormality of the recording head. Recording heads generally have high reliability and abnormalities rarely occur in them, but reliability of recording must be ensured even if such abnormalities occur only rarely.
If a rare and peculiar failure occurs in which no information is actually stored on the surface of magnetic recording media but the magnetic disk apparatus itself fails to issue any failure signals (hereinafter called “unwritable/unnotifying failure”), pre-write data remains on the magnetic recording media. If the region in question is read, the magnetic disk apparatus itself is not aware of, and cannot detect, the abnormality and instead reads the data remaining, which is sent to a central processing unit and other host devices. Such a peculiar failure consequently cannot be eliminated even in structures used in RAID apparatuses. In other words, data lost through an unwritable/unnotifying failure cannot be recovered even in a RAID apparatus structure.
More specifically, class 4 and class 5 structures of RAID in RAID technology use, as a redundant data (parity) creating unit when writing information, pre-update data, new data and pre-update parity to create a new parity.
If the unwritable/unnotifying failure occurs in pre-update data and pre-update parity, which are base data to create a new parity, the new parity created becomes improper. As a result, when the RAID apparatus detects the failure at this stage and attempts to create data of the failed magnetic disk apparatus using other, normally operating magnetic disk apparatuses, it would create an improper data.
The inventors of the present application examined a method of diagnosing every time a write operation is executed, as well as a method of diagnosing at a certain time interval, as a timing to diagnose a magnetic disk apparatus itself.
The former can detect a failure when an unwritable/unnotifying failure occurs, but it requires processing time for diagnosis. Specifically, normal magnetic disk apparatuses require a waiting time that is at least equivalent to one revolution of magnetic disk media to read data that has been written. In a magnetic disk apparatus whose media's number of revolutions is 10,000 rpm, there would be an increase in waiting time and an increase in write verification processing time of at least 6 msec.
In the latter, an increase in write verification processing time for every execution of write operation can be prevented. However, if an unwritable/unnotifying failure occurs between one diagnosis and the next on a magnetic disk apparatus, data that caused such a failure (i.e., old data that remains) would be sent to host devices.