The present disclosure relates to detecting and removing channel paths from use when errors are detected, and more specifically, to proactively removing channel paths from a variable scope of I/O devices.
Input/Output, or I/O devices, are generally connected to a processor via multiple channel paths. These I/O devices are designed such that they will remain functional as long as at least one channel path between the processor and the device is operational.
Typically, when an I/O operation is executed and an error is detected on a specific channel path, the operating system tests the channel path by issuing one or more recovery related I/O operations. During this process, applications wait to use the I/O device. If the recovery I/O operation is unsuccessful, the channel path is removed from the device, the application I/O is resumed and the original I/O operation is retried. This error recovery processing occurs on a device by device basis, so if the channel path error is associated with a hardware component that is shared by multiple devices (e.g., channel, channel card, switch, control unit port, control unit adapter card, fiber optic cables, etc.), each device using that hardware component will have to encounter the error before removing the defective channel path from the device.
The channel path error recovery process negatively affects application performance by delaying the application while the operating system performs recovery and then retrying the original I/O request. Furthermore, if an application uses multiple I/O devices that share the failing channel path, additional errors will be encountered as each device attempts to use the failing channel path causing further delays.
Additionally, certain types of channel path errors are intermittent in nature, that is, the application I/O may encounter an error but the recovery I/O used to test the channel path is successful and the channel path is therefore not removed from the device. Intermittent channel path errors negatively affect performance because applications may encounter errors multiple times. Typically when intermittent channel path errors occur, the channel path must be manually removed from the affected devices to stop errors from occurring.