Recovery operations, some requiring a reset, are generally performed today at the channel path level. Those skilled in the art are familiar with the need to conduct recovery operations upon the occurrence of certain error conditions. Two examples of these errors are: (1) the so-called "hot I/O" which is a hardware malfunction which occurs when a device repeatedly presents the I/O subsystem with unsolicited interrupts and (2) a reset event which is a hardware error condition which occurs when a control unit signals the I/O subsystem that a system reset has occurred. The reset event signal is generally given when the software attempts I/O operations on the interface where the reset previously occurred. Hot I/O and reset event conditions are conditions whose scope is limited to the control unit level. Since no method exists today to isolate control unit level errors to the control unit in error, it is necessary to perform recovery at the channel path level.
Programming systems such as IBM's MVS provide error detection logic to detect these conditions and the system architecture provides instructions that allow for various levels of error recovery. One of the options presently available for error recovery is the reset channel path (RCHP) instruction. This instruction is described in IBM publication SA22-7200 IBM Enterprise System Architecture/370. The RCHP instruction allows the program to issue a system reset signal on the channel path to which the control unit is attached. The system reset signal frequently causes special micro-code to be executed in the control unit (typically re-initialization) which has proven successful in recovering from these failures. During the channel path recovery operation, all devices connected to the channel path being recovered suffer a service outage. All I/O to the devices is suspended until recovery is complete and the paths to the devices have been reinitialized. Furthermore, if just one device is shared among multiple systems, there may be a need to stop all sharing processors in order to ensure that data integrity is maintained during the recovery. If recovery operations could be limited to the control unit to which the reporting device is connected, considerable processing time could be saved.
The introduction of I/O architecture and topologies permitting communications directly with individual control units presents an opportunity to reduce some of the disruption and other inconveniences associated with system resets. Unfortunately, there are no available methods for initiating control unit level recovery operations. Furthermore, the problem is complicated by the fact that many systems today employ both multidropped and switched point-to-point I/O interface topologies.
It is, therefore, an object of this invention to provide a method and apparatus for effecting control unit level reset operations which will not cause a disruption in the activities of other control units which may be connected to the same channel.
It is a further object of this invention to provide a method and an apparatus for conducting reset operations in a more efficient way so that fewer of them will actually be required in the operation of a data processing system.
It is a further object of this invention to provide more granular error recovery from control unit/device failures, thus exploiting the switched point-to-point I/O topology such as that employed by IBM's ESCON I/O interface.
Finally, it is an object of this invention to provide a method and an apparatus for initiating either channel path recovery operations or control unit recovery operations as may be appropriate in systems employing both multi-dropped and switched point-to-point I/O interface topologies.