This invention relates to a disk array apparatus used as a data storage (or a data memory) in an information processing apparatus. This invention relates also to an abnormality or error control method for use in the disk array apparatus and to a recording medium memorizing a control program for executing the abnormality control method.
In recent years, the disk array apparatus which comprises a plurality of disk units, such as hard-disk units (namely, hard-disk storages) has widely come into use for recording data together with redundant data as parity data. Such a structure of the disk array apparatus is known as a RAID (Redundant Array of Inexpensive Disks) configuration. If any abnormality such as a read-error happens in one of the disks units, the data can be reconstructed by the data read from the remaining disk units. Thus, the disk array apparatus is highly reliable as a data storage (or a data memory).
In the meanwhile, a disk unit recently used is much increased in recording density. In such a disk unit, presence of a very small flaw or crack on a recording medium may possibly cause the read-error. Upon occurrence of the read-error, retry (may also be called internal retry) of a data writing operation or a data reading operation is generally executed within the disk unit. The probability of occurrence of the read-error is more and more increased following the increase in recording density of the disk unit so that execution of such retry becomes more and more frequent. The retry typically requires several seconds. Sometimes, the read-error can not be recovered by the retry and the data writing operation or the data reading operation is completely impossible. In this event, maintenance work is required to exchange or repair the disk unit.
In view of the above, the disk array apparatus currently used in ordinary data processing waits for completion of the retry in the disk unit. If the retry is successful and the abnormality is recovered, the ordinary data processing is continued. On the contrary, if the data reading operation or the data writing operation is still impossible on the retry, the disk unit is disconnected as a faulty disk unit and the disk array apparatus is operated in a degenerate mode.
However, the disk unit recently available tends to require a long time for the retry because of the high recording density. This brings about a delay in an original or primary operation of the disk unit, i.e., data input/output between the disk unit and a host computer. Taking this into account, proposal is made of an approach in which any disk unit with the abnormality occurring therein is disconnected from the disk array apparatus as a faulty disk unit and the disk array apparatus is operated in the degenerate mode as far as the retry is not completed within a predetermined time period as a retry period.
In case where the disk array apparatus is used as a motion picture storage, a latency or waiting time between issuance of a data writing instruction or a data reading instruction and completion of data transfer is very short. This is because motion pictures are continuously recorded and reproduced. In this event, the retry period of a sufficient length can not be reserved. As far as the retry period is restricted within such a very short latency, the retry is often unsuccessful so that the disk unit is disconnected as a faulty disk unit and the disk array apparatus is operated in the degenerate mode.
Thus, in the above-mentioned approach in which the retry period is restricted, the disk unit is often judged as a faulty unit and disconnected from the disk array apparatus although the disk unit might be in fact normal. In other words, if the retry period was sufficiently long, the data writing operation or the data reading operation would have been successful. It is unfavorable that the disk unit which is in fact normal is disconnected and subjected to maintenance work such as repair and exchange.
During the degenerate mode, the disk array apparatus has no redundancy. If another error is happens in a different position or a different disk unit, it is highly possible that the data writing operation or the reading operation becomes impossible. Thus, disconnection of any disk unit in which the retry is not successful within the predetermined time period is unfavorable because probability of operation in the degenerate mode is increased and the reliability as the data storage is decreased.
It is therefore an object of this invention to provide a disk array apparatus which is capable of dealing with an abnormality such as a write-error or a read-error without delaying an original or primary operation of the disk array apparatus and which is prevented from being undesiredly operated in a degenerate mode by disconnecting a disk unit from the disk array apparatus which disk unit is temporarily faulty but is in fact normal.
It is another object of this invention to provide an abnormality control method for dealing with an abnormality in the above-mentioned disk array apparatus.
It is still another object of this invention to provide a recording medium which stores a program for executing the above-mentioned abnormality control method.
A disk array apparatus to which this invention is applicable comprises a plurality of disk units so as to have a redundancy and carries out, in response to a data writing instruction or a data reading instruction from a host computer, a data writing operation or a data reading operation between the disk units of the disk array apparatus and the host computer in a normal mode.
According to this invention, the disk array apparatus comprises: detecting means for detecting, as a faulty unit, one of the disk units in which an abnormality occurs on the data writing operation and the data reading operation; memorizing means for memorizing information indicative of the faulty unit; disconnection managing means for managing disconnection of the faulty unit by temporarily disconnecting the faulty unit from the disk array apparatus as a temporarily disconnected unit to make the disk array apparatus operate in a temporary degenerate mode; instruction-execution controlling means for controlling instruction-execution to force the disk units except the temporarily disconnected unit to execute the data writing operation or the reading operation between the disk units except the temporarily disconnected unit and the host computer by the use of the redundancy when the disk array apparatus receives, during the temporary degenerate mode, the data writing instruction or the data reading instruction from the host computer; and retry means for carrying out retry for the temporarily disconnected unit in parallel to the data writing operation or the data reading operation executed between the disk units except the temporarily disconnected unit and the host computer.
An abnormality control method to which this invention is applicable is for use in a disk array apparatus which comprises a plurality of disk units so as to have a redundancy and which carries out, in response to a data writing instruction or a data reading instruction from a host computer, a data writing operation or a data reading operation between the disk units of the disk array apparatus and the host computer in a normal mode.
According to an aspect of this invention, the abnormality control method comprises: a detecting step of detecting, as a faulty unit, one of the disk units in which an abnormality occurs on the data writing operation or the data reading operation; a memorizing step of memorizing information indicative of the faulty unit; a disconnection managing step of managing disconnection of the faulty unit by temporarily disconnecting the faulty unit from the disk array apparatus as a temporarily disconnected unit to make the disk array apparatus operate in a temporary degenerate mode; an instruction-execution controlling step of controlling instruction-execution to force the disk units except the temporarily disconnected unit to execute the data writing operation or the reading operation between the disk units except the temporarily disconnected unit and the host computer by the use of the redundancy when the disk array apparatus receives, during the temporary degenerate mode, the data writing instruction or the data reading instruction from the host computer; and a retry step of carrying out retry for the temporarily disconnected unit in parallel to the data writing operation or the data reading operation executed between the disk units except the temporarily disconnected unit and the host computer.
According to another aspect of this invention, the abnormality control method comprises the steps of: making when an abnormality occurs in a particular unit of the disk units, the disk array apparatus operate temporarily in a temporary degenerate mode due to the disk units except the particular unit, and carrying out retry for the particular unit while the disk array apparatus operates in the temporary degenerate mode due to the disk units except the particular unit.
According to still another aspect of this invention, there is provided a recording medium which records a control program for executing the abnormality control method.