The present invention relates to generally input/output processing of an electronic data system, more specifically, to input/output request timeout management.
Electronic data systems typically employ input/output (I/O) request schemes to add, update and manage I/O devices implemented in the system. When an I/O request (i.e., I/O process) is issued to a device, the system may optionally monitor a time of the I/O request and take one or more actions when a missing interrupt occurs, i.e., when the time expires. These actions could include, for example, issuing a message, collecting/logging diagnostic information, terminating the I/O request, performing device recovery in attempt to correct the problem, or swapping over to an alternate device. The amount of time the system waits can either be provided by the application, the customer via configuration parameters, and/or from the device itself. The device may provide multiple time out values to allow different types of I/O requests to be timed differently. For example, there might be a primary I/O timeout value for short running commands and a secondary I/O timeout value for long running commands. There are a number of issues regarding the use of the timeout values.
First, the timeout values are based on the maximum amount of time it would require to complete an I/O request, taking into consideration any device related recovery that is required. For example, the timeout value for a direct access storage device (DASD) is 30 seconds, which is orders of magnitude higher than the amount of time required for a normal I/O operation to complete.
Second, there is no capability for the device to extend the amount of time the operating system (OS) should wait for an I/O request to complete. This becomes more of a problem when the gap between the primary and secondary timeout is very large. In the case of tape I/O requests, for example, the primary timeout value may be set at 30 seconds while the secondary timeout value may be set to 45 minutes to handle the worst case time for long running commands such as rewinding a tape. If what normally would be a short running command needs to be extended for a period of time, then either the secondary timeout value must be used, which means the application could be delayed for an extremely large amount of time, or the primary timeout needs to be changed to a higher value, which would affect all I/O requests.
Third, device specific code in the operating system may be used to extend the amount of time the operating system waits before declaring a timeout condition. It may be difficult, however, to determine the amount of time at which to extend the wait time. Without feedback from the device, there's no good way for the operating system to determine this value. Even if the device supplied a unique timeout value for every supported command, this is still an issue since some commands are variable in nature. For example, certain flashcopy and Peer to Peer Remote Copy (PPRC) commands may take longer than the DASD primary missing interrupt handler (MIH) time of 30 seconds. The actual amount of time required may be dependent on the volume size and disk technology used.