The present invention relates to a memory for performing access or read/write in parallel with a plurality of independent storage units as a set, and more particularly to a data reconstruction system and a method used therein which are available in occurrence of a failure.
The technology for controlling discs arranged in parallel is disclosed in Japanese Kokai 1-250128 corresponding to U.S. patent application Ser. No. 07/118,785 filed on Nov. 6, 1987, now U.S. Pat. No. 4,870,643, and Japanese Kokai 2-135555.
As for the technology for achieving the large capacity of a memory and the high speed transfer of data, there is known a method in which the data is divided into a plurality of data of bit units, byte units or arbitrary units, with a plurality of storage units as a set, to be stored in the respective storage units, and when the data is to be read out, the plurality of data is simultaneously read out from the respective storage units. Moreover, in this method, the data to be used for a parity check is produced from the data divided among the storage units to be stored in another storage unit. When the failure occurs in any of the storage units, the data stored in the remaining normal storage units and the data for the parity check are used to reconstruct the faulty data, thereby to improve the reliability of the memory.
Further, there is known the technology in which when the failure occurs in any of the storage units, not only the data is reconstructed for the normal read operation, but also the data stored in the storage unit at fault is reconstructed to be stored in the normal storage unit which is additionally provided. With this technology, the reconstructed data is stored in the spare storage unit and the data is read out from the spare storage unit for the subsequent access, whereby it is possible to improve the availability of the memory.
The failure of a certain number of storage units can be repaired by providing the parity data, and the data can also be reconstructed by the provision of the spare storage unit. However, for the operation of repairing the failure, it is necessary to read out all of the data stored in the normal storage units and the data for the parity check, reconstruct the faulty data and write the reconstructed data to the spare storage unit. Therefore, during the repair of the failure, the storage units are occupied so that the request to process the normal access or read/write which is issued from a host unit continues to wait. This results in the degradation of the performance of the memory. As for the error check method for reconstructing the faulty data, there are known the parity data, Reed-Solomon code and error check code (ECC) methods.
Although the redundancy is provided for the failure of a plurality of storage units, the failure repair in the failure of one storage unit and that in the failure of a plurality of storage units are managed without taking the distinction therebetween into consideration. Therefore, putting emphasis on the repair of the failure, since the processing of the normal access or read/write cannot be performed in spite of the failure of one storage unit, there arises a problem in that the efficiency of the processing of the normal access or read/write is reduced. On the other hand, putting emphasis on the normal access or read/write operation, there arises a problem in that the time required for the repair of the failure is not secure during the failure of a plurality of storage units, and as a result, the possibility that the whole system may break down will be increased.
It is therefore an object of the present invention to minimize the reduction of the processing of the normal access or read/write in the failure, limit the time required for the repair of the failure within a fixed period of time, and ensure the high reliability, with respect to a memory which has the redundancy for the failure of two or more storage units.
It is another object of the present invention to provide a data reconstruction system which is capable of selecting a suitable data reconstruction method in correspondence to the various kinds of conditions relating to the repair of the failure and carrying out the most suitable data reconstruction processing.
It is still another object of the present invention to provide a control system which is capable of changing the procedure of data reconstruction processing in correspondence to the change of redundancy relating to the number of ECC discs included in a plurality of storage units which are arranged in parallel to one another.
The above objects of the present invention are attained by the provision of a memory including a group of storage units for dividing data into a plurality of data of bit units, byte units or arbitrary units to store therein the divided data, the plurality of independent storage units forming a set; discs for storing therein ECC data corresponding to the divided data; a spare storage unit for storing therein the reconstructed data; an I/O-reconstruction control circuit for receiving a command relating to an I/O operation issued from a host unit to execute processing in accordance with the command or respond to the host unit; a timer for giving the point of failure, an elapsed time during the data reconstruction, a unit time and the like; a data reconstructing table for the storage unit at fault; and a faulty data reconstructing circuit for performing discovery of the faulty data, data reconstruction and an operation of writing data to a spare storage disc, wherein when a failure occurs in any of the storage units, the faulty data reconstructing circuit detects the failure by an error check to inform the I/O-reconstruction control circuit of the failure, and the I/O-reconstruction control circuit discriminates a state of the failure to select the preferred processing suitable for the state of the failure out of the processing of the normal access or read/write and the data reconstruction processing, thereby to execute the selected processing, or set the frequency of the processing of the normal access or read/write and the data reconstruction, or the amount of the data reconstruction within a unit time.
When the failure occurs in the above memory, the redundancy of the memory, the elapsed time during the data reconstruction, and the state of the normal access or read/write processing and the like are discriminated, and the data reconstruction processing (method) suitable therefor is selected. Therefore, it is possible to prevent reduction of the performance of the processing of the normal access or read/write and ensure the high reliability of the memory. More specifically, in the case where the number of storage units at fault is less than the redundancy of the memory, there is selected the data reconstruction processing (method) in which the processing of the normal access or read/write is given preference, and the faulty data is reconstructed within the remaining period of time. Therefore, no load is put on the processing of the normal access or read/write. On the other hand, in the case where there is no remaining redundancy, since the processing of reconstructing faulty data is given preference, it is possible to ensure the reliability for the failure of the memory. Moreover, in the case where there is some remaining redundancy, since the data reconstruction processing (method) is changed according to the magnitude of the time taken to repair the failure with respect to the storage units in which the failure occurred, it is possible to prevent reduction of the performance of the processing of the normal access or read/write and limit the time required for the data reconstruction within a fixed period of time. Moreover, a time zone, e.g., night, having less processing of the normal access or read/write is selected so that the system can devote itself to the data reconstruction. As a result, it is possible to reduce the load of the memory in a time zone having much processing of the normal access or read/write. Moreover, since the frequency of the data reconstruction processing, or the amount of data reconstruction within a unit time, is set according to the magnitude of the frequency of the processing of the normal access or read/write, it is possible to carry out the data reconstruction processing effectively in a time aspect.