Data processing systems frequently include large scale storage devices, such as Direct Access Storage Devices (DASD), located externally to the host computer and sometimes at significant distances therefrom. Communication from the host computer to the DASD is accomplished over signal cables, called channels, extending between the DASD and its control unit and connecting them to the host processor.
Current technology provides DASD units with several separate disks, all rotating on the same spindle. These disks, or platters, are accessed by head disk assemblies with a transducing head providing access to one surface of each disk. There may be, for example, nine platters in a disk drive providing 16 usable surfaces with one of the usable surfaces used for maintaining accurate tracking capability. In such units, there are 15 usable surfaces for data and when all the heads are positioned, a cylinder of 15 physical, recording tracks can be accessed. DASD units frequently use a Count Key Data architecture (CKD) where records written on the track are provided with a count field (an ID), a key field and a data field.
In writing these fields along a recording track, a gap is provided between each of the fields. Those gaps are then utilized to provide a time period in which the DASD control unit and the host channel can communicate with each other. It is during the gap time that the control unit provides information back to the Channel in response to the command that it has received and gets the next command in order to begin the next operation for searching, retrieving or writing records. This process is termed gap synchronous, that is to say, that the particular record on which the DASD device is working, is the same record on which the channel has requested work, so that both the channel and the device are synchronous with each other in the sense that they are both working on the same record, either to read it or to write it.
As systems become faster and faster, the delays created by the gaps or by the performance of functions within a gap period have to be shrunk to such an extent that the functions can no longer be adequately performed. This is particularly true of optical fiber channels where the data burst rate is several times the burst rate for copper channels.
Nonsynchronous storage subsystems are developed to enable the channel and the device to transfer data independently of each other. To do that, a buffer has been inserted into the data path between the device and the channel with separate data paths for the channel and the device, each under the control of separate processors. In that manner, the device processor can access records in one portion of the buffer while another portion of the buffer is being used by the channel processor. Channel programs can be executed such that the channel and storage control activities required to end execution of one command and advance to the next do not have to occur during the inter-record gap between two adjacent fields.
In a synchronous system, the device and channel operate on the same record so that the data transferred to the buffer by the device is the same data that the channel wants. In the control unit for such a system, shared variables are all that are required to implement the interface between the channel processor and the device processor. Simple shared variables are adequate because the channel processor and device processor are always performing the same operation on the same field. In a nonsynchronous system, however, the device may operate significantly ahead of the channel during read operations. The device processor is transferring data into the buffer from the device while the channel processor is accessing that data in order to send it on to the channel. The device processor is therefore the filling or leading activity, while the channel processor is the trailing or lagging activity. The reverse is true during write operations where the channel processor fills the buffer with data from the channel and, subsequently, the device processor accesses that data to send it to the device for writing the records on the storage disks. In this instance, the channel processor is the leading, or filling activity while the device processor is the trailing, or emptying activity. Since in a nonsynchronous control unit the channel and the device processors can be performing different operations on different fields, a more elaborate communication system between the two is required and is set forth herein.
Error recovery in a non-synchronous control unit must also take into account considerations which are not present in a synchronous control unit. In synchronous operation, since the channel interface processor and the device interface processor are operating on the same field, if an error occurs, the two processors in all likelihood will be noting the same error. In a non-synchronous operation it is still a reasonable assumption that two independent errors will not happen simultaneously. Nevertheless, since the channel interface processor (CHIP) and the device interface processor (DIP) are not working on the same fields at the same time it is a more distinct possibility. For example, DIP could be running considerably ahead of CHIP on read operations and run into a data check. CHIP coming along behind could run into a data overrun. The two errors are independent so there must be some way to handle a situation such as this and to recover from both of the errors. It is also possible in such a situation that since the device is considerably ahead of the channel, the error that the device encounters might be with regard to a record that the channel does not need to complete its operation. This occurs because the device processor acts to read sequential records into the buffer, starting with the first record requested by the command chain of channel command words (CCW). While the initial requested record is known to DIP, it operates ahead of CHIP and, therefore, does not know which successive records the channel requires. As a consequence, CHIP may reach the end of the records needed for the read operation before it reaches the record upon which DIP found an error. In such case, the channel program can be completed successfully without caring about the error that the leading processor noted.
An object of this invention is to provide an error recovery method for a non-synchronous control unit which recovers from single errors in the most efficient manner possible.
Another object of this invention is to recover from multiple errors in the most efficient manner possible.
Another object is to provide an error recovery method wherein multiple errors are handled as single errors insofar as possible.