1. Field of the Invention
The present invention relates to a system, method, and program for processing failed connections in an input/output (I/O) system.
2. Description of the Related Art
Host computer systems may access a mass storage unit, such as a Direct Access Storage Device (DASD), which is comprised of numerous interconnected hard disk drives (HDDs) that form a single storage space. In such systems, a storage controller would manage input/output operations between the host systems and the DASD. Examples of storage controllers include the International Business Machines (xe2x80x9cIBMxe2x80x9d) 3990 Storage Controller, described in IBM publication, xe2x80x9cIBM 3990 Storage Control Reference (Models 1, 2, and 3), IBM document no. GA32-0099-06 (Copyright IBM Corp. 1988, 1994), which publication is incorporated herein by reference in its entirety.
FIG. 1 illustrates host systems 4a, b, c that communicate to a storage controller 6 via an Enterprise Systems Connection (ESCON(copyright)) interface 8. (ESCON is a registered trademark of IBM.) The ESCON 8 interface provides an optical fibre link and one or more dynamic switches between the host systems and the storage controller 6. The storage controller 6 manages input/output operations between the DASD 10 and the host systems 4a, b, c. The host systems 4a, b, c each include a channel subsystem to control I/O operations initiated by the host systems 4a, b, c and directed to the DASD 10 and storage controller 6. The channel subsystems include one or more channels that provide a connection through which an I/O command may be delivered from the host system 4a, b, c to the storage controller 6. The channels and subchannel architecture provide the host system 4a, b, c the addressing information needed to access logical subsystems (LSSs) within the DASD 10. The channel subsystem and channel architecture in the host systems 4a, b, c are described in IBM publication, xe2x80x9cEnterprise Systems Architecture/390: Principles of Operation,xe2x80x9d IBM document no. SA22-7201-04 (Copyright IBM Corp. 1990, 1991, 1993, 1994, 1996, 1997), which publication is incorporated herein by reference in its entirety.
The ESCON interface 8 provides ports through which the host systems 4a, b, c and storage controller 6 connect. The ESCON interface 8 provides the physical and logical connection between a channel within a host system 4a, b, c and the storage controller 6. The ESCON interface 8 provides a link, which is the transmission medium for a serial I/O interface, that is a point-to-point pair of conductors (optical fibers) that physically interconnect a storage controller 6 and a channel, a channel and a dynamic switch, a storage controller 6 and a dynamic switch, or, in some cases, a dynamic switch and another dynamic switch. The ESCON interface 8 and interaction with the channel architecture in the host systems 4a, b, c is described in IBM publication xe2x80x9cESCON I/O Interface,xe2x80x9d IBM document no. SA22-7202-02 (Copyright IBM Corp. 1990, 1991, 1992), which publication is incorporated herein by reference in its entirety.
ESCON provides a frame protocol for communications between the storage controller 6 and channels in the host systems 4a, b, c. After the storage controller 6 receives a request for data from a channel in a host system 4a, b, c, the storage controller 6 disconnects from the channel to free-up the channel and ESCON interface 8 links while the storage controller 6 retrieves the requested data from the DASD 10, or otherwise executes the I/O operation. After the storage controller 6 retrieves the requested data from the DASD 10, the storage controller 6 will then attempt to reconnect to the host system 4a, b, c via the channel from which the read request was initiated or via another channel if the host 4a, b, c provides for dynamic path reconnection. With dynamic path reconnection, the storage controller 6 may reconnect to the host system 4a, b, c via any available channel path between the storage controller 6 and host system 4a, b, c. The storage controller 6 reconnects to the host system 4a, b, c to present the status of the I/O operation and return requested data for a read operation. However, if the there is no available channel path for the storage controller 6 to reconnect to the host system 4a, b, c, then the host channel 35 may return a link level busy to the storage controller 6 indicating that the reconnect cannot be retried at the moment. If the storage controller 6 attempts to reconnect through an ESCON interface 8 which does not have an available link to provide between the channels of the host system 4a, b, c and the storage controller 6, then the ESCON interface 8 will return a port busy frame to the storage controller 6 indicating that the ESCON interface 8 ports through which the storage controller 6 may reconnect to the host system 4a, b, c are busy. In current systems, a pending I/O operation has priority over reconnect requests.
After receiving a link level busy or port busy message in response to the reconnect message, the storage controller 6 will retry the reconnect command at a later time. The storage controller 6 may time-out after unsuccessfully retrying the reconnect command for a period of time. Further, if the host system 4a, b, c does not receive status information on the I/O operation for a period of time, then the I/O command may fail at the host system 4a, b, c end with a channel path time out. In such case, after the time out, the host system 4a, b, c may retry the I/O operation.
The host systems 4a, b, c may initiate enough I/O operations to consume all available channels and ESCON port resources. In such case, the storage controller 6 may not be able to reconnect and provide status as all host system 4a, b, c channels and ESCON interface 8 ports capable of providing a reconnection path are unavailable.
There is thus a need in the art for an improved method and system for managing I/O operations between host systems 4a, b, c and storage controllers 6 to prevent the I/O operation from timing out because of the inability of the storage controller 6 to reconnect to the host system 4a, b, c. 
To overcome the limitations in the prior art described above, preferred embodiments disclose a system, method, and program for managing I/O operations transmitted from a computer system to a processing unit. The processing unit manages access to a storage device and executes the I/O operation against the storage device. The processing unit receives indication that a request to connect between the processing unit and the computer system failed. Upon receiving a subsequent I/O operation after receiving indication that the connect request failed, the processing unit returns busy to the computer system initiating the subsequent I/O operation in response to receiving indication that the connect request failed. The connect request is retried after returning busy.
In further embodiments, the processing unit queues information on the failed connect request in a first queue after receiving indication that the connect request failed. The processing unit further queues information on the busy returned to the computer system in a second queue. The processing unit accesses information on a failed connect request from the first queue and retries the accessed failed connect request. The processing unit determines whether the retried connect request succeeded and returns a busy end status to the computer system after determining that the retried connect request succeeded. The computer system retries the subsequent I/O operation which was suspended as a result of the returned busy.
In still further embodiments, the processing unit increments a counter after queuing information on the failed connect request in the first queue. The processing unit determines whether the counter exceeds a predetermined value before returning busy to the computer system initiating the subsequent I/O operation. Busy is returned in response to subsequent I/O operations if the counter exceeds the predetermined value.
In yet further embodiments, the computer system and processing unit disconnect after the processing unit receives an I/O operation. The processing unit reconnects with the computer system to present status on the disconnected I/O operation after processing the disconnected I/O operation. In such embodiments, the failed connect requests queued in the first queue are requests by the processing unit to reconnect to the computer system to present status on previously disconnected I/O operations.
With preferred embodiments, the storage controller may return busy messages to host systems initiating I/O operations to reduce I/O traffic in order to make connection resources, such as channel paths and ESCON interface ports and links, available. Increased availability of such connection resources will permit the storage controller to reconnect to a host system to present status on a completed I/O operation that was previously disconnected. Preferred embodiments, determine a threshold number of failed reconnects that occur before the storage controller returns busy to inhibit new I/O operations. The storage controller may cease returning busy after a reconnect succeeds. In this way, preferred embodiments provide a mechanism to regulate I/O traffic to reduce the occurrence of reconnection operations timing out because of a lack of channel and other connection resources resulting from newly initiated I/O operations.