The present invention relates to a method and system for use in a magnetic disk storage device or a disk array system having plural magnetic disk storage devices for improving their reliability. More particularly, the present invention relates to a method and system for use in a magnetic disk storage device or disk array system for controlling the relationship between the stopping time of a magnetic disk storage device and the interval of operation time of the magnetic disk storage device so as to improve its reliability.
A disk array system known as a RAID (Redundant Array of Inexpensive Disks) system is becoming popular owing to its low price and high reliability and the fact that a disk array system can be easily recovered, even if one of the magnetic disk storage devices included in the disk array system is halted due to some difficulty.
A data recovery method is, for example, disclosed in unexamined Japanese patent publication 6-230903. Namely, when one of the magnetic disk storage devices in the disk array system has failed, correct data is recovered using the data and parity data in the other healthy magnetic disk storage devices. The recovered data is stored in open areas in the healthy magnetic disk storage devices, if the total capacity in the open areas in the healthy magnetic disk storage devices is greater than the amount of data stored in the failed magnetic disk storage device.
However, when plural magnetic disk storage devices have failed at the same time, no particularly effective recovering method other than transferring data from a backup system is available. Thus, such a recovery is beyond conventional methods. Therefore, the occurrence of a failure in plural magnetic disk storage devices at the same time may result in a severe loss of data.
Recording and retrieving data in a magnetic disk storage device is performed by a magnetic head disposed adjacent to a rotating magnetic disk in a head floating space. As magnetic disk storage devices improve the head floating space is decreased because data recording density on the magnetic disk is nearly inversely proportional to the head floating space. In order to obtain a smaller head floating space, the Contact Start Stop (CSS) system is used. The CSS system is a system in which the magnetic head contacts the magnetic disk surface by a pressing force governed by the magnetic head suspension apparatus when the magnetic disk has stopped, and floats at a desired space above the magnetic disk surface when the rotating speed of the magnetic disk has reached a predetermined speed. The magnetic head floats above the surface of the magnetic disk due to a floating force induced by an air-flow on the magnetic disk surface as the magnetic disk rotates. The floating force is balanced by the pressing force generated by the magnetic head suspension apparatus. CSS systems are commonly utilized in conventional magnetic disk storage devices.
In CSS systems, consideration should be given to the tendency over time for the magnetic heads to stick to the magnetic disk surfaces, thereby causing problems during start-up of magnetic disk rotation. Sticking occurs due to the build-up of materials such as contaminants and lubricants in the gap between the magnetic head and the magnetic disc during long term operation of the magnetic disk storage device. Severe sticking between the magnetic head and the magnetic disk is caused by surface tension or sticking force of the contaminants and lubricants when the magnetic head contacts the magnetic disk at the time the rotation of the magnetic disk has been halted. Thus, there are technical problems to be solved in CSS systems in which starting-up problems due to the above-described sticking problem might occur during restart when the magnetic disk storage device has been halted after long term continuous operation. There are no techniques in the conventional technology for addressing this problem.
Therefore, there is great concern to avoid causing simultaneous starting-up problems in plural magnetic disk storage devices of a disk array when the magnetic disk storage devices are stopped after long term continuous operation. Simultaneous starting-up problems in the magnetic disk storage devices of a disk array could cause fatal damage to the magnetic disk storage devices thereby making data recovery impossible. Such fatal damage could be even worse in a data security system where data is recovered only by generating redundant data such as parity data.
An object of the present invention is to provide a method and system for use in a magnetic disk storage device for preventing damage such as starting-up problems at restart of a magnetic disk storage device which has been halted after long term continuous operation.
Another object of the present invention is to provide a method and system for use in a magnetic disk storage device. for preventing simultaneous starting-up problems in plural magnetic disk storage devices included in a RAID system, thereby maintaining high reliability of the RAID system.
Yet another object of the present invention is to provide a method and system for use in a disk array system that allows the disk array system to respond to data read-in and data write-in requests even while the magnetic disk storage devices are being stopped at predetermined intervals so as to prevent simultaneous starting-up problems in the magnetic disk storage devices at restart.
In the present invention it was discovered that in a magnetic disk storage device there is a relationship between the stopping time of the magnetic disk storage device and the interval of operation time. More particularly, it was discovered that if the magnetic disk storage device operated according to the CSS system is intentionally stopped at intervals related to the interval of operation time of the magnetic disk storage device then the possibility of occurrence of starting-up problems in the magnetic disk storage device after the magnetic disk storage device has been halted can be reduced.
It was further discovered that the length of the interval of the stopping time necessary to reduce starting-up problems in a magnetic disk storage device is related to the type of magnetic disk storage device. In other words, a magnetic disk storage device of a first type may require an interval of stopping time longer than the interval of stopping time of a magnetic disk storage device of a second type. Further the length of the interval of operation time during which safe normal operation can be conducted, varies according to the type of magnetic disk storage device. The length of the stopping time can be several minutes to ten (10) hours whereas the length of interval of operation time can be as long as one thousand (1000) hours.
Therefore, the present invention provides a method and system for use in controlling the operation of a magnetic disk storage device so as to intentionally stop the magnetic disk storage device at an interval of stopping time related to the interval of operation time of the magnetic disk storage device so as to improve reliability of the magnetic disk storage device.
The disk array system of the present invention utilizes the function of a RAID system such as that disclosed in unexamined Japanese patent publication 6-230903 of keeping operation recovery data in another magnetic disk storage device for use when one of the magnetic disk storage devices within the disk array system has been stopped due to a failure. By using this function along with the present invention the disk array system can continue to operate even though each magnetic disk storage device is intentionally stopped for a period of time. However, intentionally stopping the magnetic disk storage devices of the disk array system as described above makes access to the magnetic disk storage device for reading and writing operations impossible.
According to the present invention the magnetic disk storage devices of a disk array system are stopped one by one in sequence in a specified interval without stopping the disk array system. Thus, the disk array system is allowed to continue operation even though each of the magnetic disk storage devices is stopped sequentially to prevent starting-up problems in each magnetic disk storage device. As described above, the starting-up problems result from sticking of the magnetic head to the magnetic disk at restart of the magnetic disk storage device after it has been halted after long term operation. Conducting the sequential stopping prevents a complete failure of the disk array system.
As each magnetic disk storage device is halted in the manner described above, an access to the magnetic disk storage device in the form of a request for data read-out can not be responded to by the magnetic disk device. However, the present invention responds to the request for data read-out by initiating the RAID system data recovery function wherein recovered data corresponding to the data read-out request is generated using the operation recovery data in the other magnetic disk storage devices. The RAID system data recovery function is normally executed when the magnetic disk storage device has been stopped due to a failure. In the present invention the RAID system data recovery function is used to respond to a data read-out request when the magnetic disk storage device has been stopped to prevent starting-up problems in subsequent restarts.
Further, when a magnetic disk storage device has been halted in the manner described above a request for data write-in to the magnetic disk storage device which has been halted can be responded to by dispersively recording only the write-in data to the other magnetic disk storage devices without calculating parity data. Alternatively in order to maintain data integrity, parity data can be calculated. Further, the write-in data can be temporarily stored in an alternate memory separate from the magnetic disk storage devices. After the magnetic disk storage device has been restarted the write-in data temporarily stored in the alternate memory is transferred from the alternate memory to the restarted magnetic disk storage device.
If time zones where no data write-in requests are issued are known in advance, for example, according to an operating schedule, then plural magnetic disk storage devices can be stopped in those time zones. In such a case the alternate memory can be eliminated since no request for data write-in should be issued nor responded to. The time zones where few data write-in requests are issued can alternatively be detected based on statistical information of actual operation that occurred in the past.
The alternate memory can also be eliminated by using a device-busy signal when the magnetic disk storage device has been halted. The device-busy signal is sent to the source that issued the data write-in request. The source upon receiving the device-busy signal seeks to write the data elsewhere or attempts to write the data to the magnetic disk storage device at a later time.