The present invention relates to a method of diagnosing magnetic disk devices, and more particularly, to a magnetic disk device having a failure prediction function.
Trouble-free operation for three to five years on a 24-hours-a-day basis is usually guaranteed in the highly reliable magnetic disk devices that assume 24-hour operation. In actual operation, systems are usually shut down once or twice a year and periodic maintenance and checks encompassing the respective magnetic disk devices are performed during the shutdown.
These magnetic disk devices each have uniform performance, and their bit error rates and the frequency of occurrence of other error events remain low, even after a lapse of several thousands of hours from mounting of a new magnetic disk device(s) or from maintenance or replacement of existing ones. These bit error rates, for example, are as small as several bits per 10 gigabytes of data access.
Under conventional technologies, therefore, bit error rates and other error indices have been accumulated for fixed periods of time and if a count of any such error index exceeds the required threshold within the fixed period, this state has been reported to a host device. Additionally, instead of estimating the occurrence of a failure, the host device has compared a previously set value, e.g., a mean time between failures (MTBF), and the particular fixed period, and presented to the operator the information indicating the replacement time of the magnetic disk device.
The mounting environment for a magnetic disk device, however, is diverse, and physically, changes in ambient temperature or vibration due to a disturbance may render the internal spindle motor or actuator of the magnetic disk device abnormal, or electromagnetic noise internal or external to the device may result in the magnetic head itself or transmission circuits becoming abnormal. If these abnormal events actually happen, bit errors occur intensively or collectively at a certain time. In spite of the former abnormality not being able to be disregarded as a sign of a failure in the magnetic disk device, the abnormality has not been detectible with the conventional technologies.
In short, under conventional diagnosing methods, counts of the error indices which occurred until the required time had passed have been accumulated, then whether the accumulated respective counts of the error indices exceeded the required thresholds have been judged, and the particular event has been diagnosed as an abnormality only if the counts exceeded the thresholds. A mean time between failures (MTBF) has been adopted as the required time.
For these reasons, even if, in the course of passage of the required time, the spindle motor or the actuator temporarily became abnormal and there was an increase only in a specific count among all error index counts, the event has not been diagnosed as an abnormality since the rate of that count value was slight with respect to all error index data accumulated up to the passage of the required time. That is, the magnetic disk device has been placed under the situation where it is not replaced unless its performance deteriorates very significantly.
According to Japanese Patent Laid-Open No. Hei 6-214835, the occurrence rates of errors are cumulatively recorded in a memory for each peripheral device and each error cause, then the error information is transmitted to a central processing unit, and based on the transmitted error information and on the deterioration characteristics, operation time, and other factors of each peripheral device that are retained beforehand, the central processing unit predicts the occurrence time of permanent failures. Details of the prediction method are unknown.
According to Japanese Patent Laid-Open No. Hei 7-248937, the operation time of devices from their replacement or from their previous diagnosis is measured and if a mean time between failures, predetermined for each device, or other required time is reached, a self-diagnosing program is executed to suppress unnecessary self-diagnosis.
According to Japanese Patent Laid-Open No. 2001-307435, an error test for monitoring an error rate by deteriorating an S/N ratio is repeated for each fixed amount of data transfer, and thus the occurrence of a failure is predicted.