The present invention is related to checkpointing for recovery of channels in a data processing system, and is more particularly related to checkpointing for recovery of channels using a protocol which allows for multiplexing operations at the frame level and streaming of commands and data.
In a data processing system, such as the IBM S/390 system having channels whose operation in controlled by Channel Command Words (CCWs), and whose Input/Output (I/O) links are fiber optics using the IBM FICON connectivity architecture, when a channel is attempting to recover from interface errors on the fiber link and the subchannel is in the active state, the channel can attempt retry of the operation from the point of failure by issuing a selective reset with request for retry, specifying which CCW to retry. When, as a conclusion to an unsuccessful retry recovery action, the Interface Control Check (IFCC) status is presented to the S/390 operating system, fields in the Extended Status Word/Extended Report Word (ESW/ERW) must be set up, as explained in IBM Enterprise Systems Architecture/390 Principles of Operation, SA22-7201-06, available from International Business Machines Corporation of Armonk, N. Y. Among these is the primary CCW address which communicates back to the operating system the progress the channel has made through the CCW chain at the time of the error. Based on this information the operating system can determine what storage has been updated for use in its error recovery procedures. On S/390 channels prior to FICON, the protocols only allowed the channel to send the next command in a CCW chain upon receipt of an explicit indication (status or data) that the prior command execution was complete. However FICON protocols allow the channel to stream commands and/or data out to a single device, while simultaneously doing the same for multiple devices.
U.S. Pat. No. 5,392,425 issued Feb. 21, 1995 to Elliott et al for CHANNEL-INITIATED RETRY AND UNIT CHECK FOR PERIPHERAL DEVICES, discloses retrying a command from a CCW in a data processing I/O system having a channel connected to a control unit in which the channel detects an error condition and requests the control unit to retry the current command of an I/O operation.
The present invention provides a method, program product and apparatus which allows the channel to: 1) manage the data necessary for the recovery of an operation for a single device while multiple devices are active (checkpointing) and 2) determine the correct primary CCW address to report in the IFCC status by tracking and examining relevant checkpoints.
With the implementation of IBM FICON architecture, the channel is allowed to stream multiple commands out to a control unit without waiting for positive confirmation that any of the preceding commands are complete. In addition, this may occur for multiple devices simultaneously. An object of the present invention is to track within the FICON channel, the progress of CCWs through their various stages, so that when an error is detected and an operation is aborted, the channel can properly select which CCW to attempt to retry with the control unit and for unsuccessful retries to report back to software the correct primary CCW address indicating the extent to which the channel completed modifying and accessing S/390 storage. FICON architecture establishes two checkpointing events: if the CCW is a xe2x80x98Readxe2x80x99 with a non-zero byte count, or the CCW flags contain Program Controlled Interruption (PCI), a checkpoint is established between the channel and control unit for that CCW number.
It is also an object of the present invention to implement checkpointing concepts in a manner that has minimal impact on functional performance, tracking only the minimal data needed during normal operation and using that data in lengthier analysis performed during error recovery. This data is tracked on a xe2x80x98per operationxe2x80x99 basis so that many operations can be concurrently ongoing, and utilizes the architectural concept of CCW numbering for each CCW in a chain.