The present invention relates to an error recovery method and apparatus and more particularly to error recovery processing when a plurality of block data in a peripheral recording unit is transferred and stored in the main memory by the processing of a single command. Such a command is contained in a CCW chain or channel control word chain. A plurality of blocks of data, called records, are recorded on a disc media of a magnetic disc unit as one example of a peripheral recording unit. When a central processor unit (CPU) needs some data from the magnetic disc unit, the records are read from the disc medium and are stored temporarily in a main memory. The data is usually checked for errors whether transferring from the magnetic disc unit to the main memory or whether transferring from main memory to the disc unit. If there exists correctable errors in the data these errors are corrected automatically and the corrected data is stored in the main memory.
On the other hand, when there exists uncorrectable errors in the data read from the magnetic disc unit, it is necessary to retransfer the data read from the magnetic disk unit to the main memory. In this case, it is useful to process the data by command retries, such as input-output command retries. With respect to a command retry, the same command is processed repeatedly between a channel and the disc controller without processing by the CPU. Such a system is disclosed in U.S. Pat. No. 3,688 274.
Errors which are the object of command retries by a disc controller include uncorrectable reading errors, the overrun of a disc-drive device for a channel caused by timing errors between the disc controller and the CPU, partially correctable erroneous data input, etc.
When an error which can be recovered by a command retry is generated, the disc controller requests a command retry for the channel by recalling a previous CCW and relocates the magnetic head at the data record which proved erroneous to read or write the data from or to the disc. During this time, recovery processing is done according to the error content. For instance, when the error cannot be recovered after several retries, the magnetic head of the disc unit is slightly shifted in a direction perpendicular to the track to re-read the data.
In command retry processing of the disc controller described above, the data record relocated by the magnetic head during the retries is limited to the last single data record that had been processed immediately beforehand. In the case where a single data record is processed by any command, the data record that proved to be in error is the record processed immediately beforehand. This limitation results primarily from the control system of the relocation mechanism of the disc unit and does not raise any problems when only one data record is processed during execution of the CCW chain containing a single command.
If a plurality of data records is required for processing one command and, moreover, if any errors occur in any but the first record of a plurality of records (i.e. a second or subsequent record), command retry cannot be executed. This is because, although a command retry must reexecute the command from the beginning, or it must reexecute the transfer of the required data records from the beginning, the disc controller cannot relocate the initial data record because it is not the data record processed immediately beforehand. When the command retry cannot be executed in the above case, the disc controller stops processing the command and informs the CPU of the error situation. The CPU usually generates a command to recover the error after receiving the information of the error from the disc controller. However, the disc controller repeats only the command retry under the command control from the CPU and nothing happens because of the limitation discussed above. No error recovery takes place.
Since the disc controller does not execute any recovery processing in this case, there is a strong probability that the same error will occur again in the same data record.
As described above, when the data necessary for processing one command covers a plurality of data records, and when an input-output error occurs in the second or subsequent data block, a problem occurs in that recovery can not be provided because it is the second or subsequent data block that contains the error even if it is an input-output error that should be recovered by a command retry.