The present invention relates generally to network computing and, more particularly, to a method, system, and storage medium for resolving contention issues among channels that occur during channel program execution.
Utilizing a current protocol such as the FC-SB-3 protocol (FICON), a control unit (CU) typically responds to the first command issued by a channel for a new channel program with a ‘device-busy’ status indication in situations when its resources are completely utilized. When this occurs, the CU ‘owes’ the channel a ‘no-longer-busy’ status response when the CU becomes not busy. When the channel receives the ‘no-longer-busy’ status, it accepts the status and ends the connection with the CU. Subsequently, if the channel still needs to initiate the new channel program, it is re-initiated by sending a new command.
The FICON protocol encounters problems if, during the time that a CU is busy, it receives requests from several channels to initiate new channel programs. In this instance, the CU responds to all of the channels with a ‘device-busy’ status. When the CU becomes no longer busy, it can either send a ‘no-longer-busy’ status to all the channels simultaneously, or it can send the ‘no-longer-busy’ status to a single channel at a time. In many cases, both of these alternatives result in some of the channels timing out while waiting for the ‘no-longer-busy’ status.
If the CU sends a ‘no-longer-busy’ status to all of the channels simultaneously, it waits for one of the channels to re-initiate the channel program. When the CU receives the command from the first channel that re-initiates the channel program, it begins execution of that channel program. When the other channels attempt to re-initiate their respective channel programs, the CU responds to each of them with a ‘device-busy’ status. When the CU completes the channel program and again becomes no longer busy, it once again sends a ‘no-longer-busy’ status to those channels to which it has previously sent a ‘device-busy’ status. As in the first case, the CU becomes busy once again when it receives a command from the first channel that re-initiates a channel program, and it responds with a ‘device-busy’ status to other channels which attempt to re-initiate channel programs. This mode of operation causes problems because each time the CU sends a ‘no-longer-busy’ status to all of the channels, there is a race among the channels to re-initiate the channel program. Since the fastest channel typically wins the race, the slower channels are prevented from initiating their channel programs for long time periods. In many cases, these time periods are so long that upper-level software timers expire, and the applications running on these channels fail.
In order to eliminate the race described above, the CU may alternatively send a ‘no-longer-busy’ status to a single channel at a time. After sending a ‘no-longer-busy’ status to a given channel, it waits for the channel to respond by initiating a new channel program. When that channel program is complete, the CU sends a ‘no-longer-busy’ status to the next channel, and allows that channel to respond. This process continues until the CU has sent a ‘no-longer-busy’ status to all of the channels to which it owes this response. Although this mode of operation avoids causing a race among the channels, another problem occurs when a channel no longer needs to initiate a new channel program when it receives the ‘no-longer-busy’ status. This typically occurs when software has awaited completion of the pending operation until a ‘Missing Interrupt Handler’ timeout has occurred, in which case the software withdraws the pending I/O request. In this case, the CU waits a model-dependent time period before assuming that the channel has decided not to initiate a new channel program. The time that the CU needs to wait is often well over ten milliseconds because it takes some of the slower channels this long to re-initiate an I/O operation after receiving a ‘no-longer-busy’ status. During the time when the CU is waiting, timers that are running on all of the other channels that received the ‘device-busy’ status begin to timeout, causing the channels to enter more catastrophic recovery sequences and thereby compounding the problem.
What is needed, therefore, is a way to resolve these contention issues among channels during channel program execution.