Problems exist in PLDA networks using FCP in detecting and correcting error conditions on sequential access devices, such as tapes. The basic causes of these problems include the lack of a guaranteed delivery protocol and the implicit state information intrinsic to sequential access devices. More specifically, lost frames in FCP can result in FC information units being lost. Upper level protocol (ULP) recovery is not sufficient for a variety of reasons, including an inability to detect such errors, the effort required to implement recovery mechanisms, and the extended time required to detect and recover from error conditions.
The problem not being able to detect and correct error conditions stems from the fact that fiber channel in this environment provides a capability to perform operations that would normally be performed on a parallel bus. With FCP PLDA devices, operations are being performed on a serial data gram delivery service where packets of information may be lost.
Whereas on a parallel bus immediate feedback and acknowledgment occurs thus guaranteeing robust delivery with immediate state information between two devices on the bus that may be communicating, the FCP PLDA environment does not possess these characteristics.
On stream and media changer devices there are two classes of commands for which it is critical to know whether the command was accepted by the target, and then whether successful completion of the command occurred. The first class of commands, which is unique to these devices, are those that alter the media state or content in a way that simply re-executing the command will not recover the error. One such command set is the read/write/position/write filemarks, for which the tape is repositioned past the referenced block(s) or files only if the operation started. These commands control how far the operation continued. For these commands, it is critical to proper recovery to know this information. Also, the move medium/load/unload medium commands are important, which may have actually hanged the medium in the target. Unfortunately, these comprise most of the commands issued during normal operation of the subsystem.
The second class of commands are not unique to these devices. With these commands information may be lost if the command is presumed to have been sent by the target, but not received by the initiator. These commands include request sense and read/reset log. Loss of sense data also may affect error recovery from failed commands of the aforementioned media move/change class, but it may also affect proper error recovery for cached/RAID disk controllers as well.
On a parallel SCSI bus, the host adapter has positive confirmation that the target accepted the command by the fact that the target requested all bytes of the CDB and continued to the next phase without a Restore Pointers message. Such confirmation is only implicit in a serial protocol by receipt of a response message, such as transfer-ready or response. In cases of some commands, this implicit confirmation may require a lengthy period of time, during which mechanical movement requiring several multiples of E.sub.-- D.sub.-- TOV occurs. In FLA environments, however, R.sub.--A.sub.-- TOV may be the appropriate value. Similarly, the target has positive confirmation that the host has accepted sense or log data immediately upon completion of the data and status phases. This data, once received by the target, may now be reset. In a serial environment, this is only implicit by receipt of the next command. Note that a change to the target to only clear sense/log data on receipt of a command other than request sense or read/reset log would eliminate this problem.
The errors that are of concern are where FCP information units are lost in transit between an FCP initiator and target. The cause for such loss is not specific, but is assumed to be cases where a link level connection is maintained between the target and initiator, and some number of FCP IU's are dropped. Other cases are either handled by PLDA through existing methods, or may be generally classified as unrecoverable and treated in a fashion similar to a SCSI bus reset.
In order to meet the defined requirements, any proposed solution must enable the initiator to make the following determinations. In particular, there is a need for a method of enabling the initiator to determine that an error condition occurred (an FCP IU is expected and not received, or not responded to). There is a need to be able to determine whether an FCP-CMND was received by the target.
If the command is a FCP-DATA command, there is a need to determine whether it was received or sent by target. If the command is a FCP XFER-RDY or FCP-RSP command, there is a need to determine whether it was sent by target. There is also the need for the solution to work in a Class 3 environment, preferably with no change to existing hardware. The tools prescribed in FC-PH for FC-2 recovery are the Read Exchange Status (RES), and Read Sequence Status (RSS) Extended Link Services, and the Abort Sequence (ABTS) Basic Link Service. RES is an appropriate tool for the host adapter to use. Its function is to inquire of the status of an operation during and for some period of time after its life. Unfortunately, in several of the cases of interest, the RX.sub.-- ID is unknown to the exchange initiator.
In these cases, the initiator must use an RX.sub.-- ID of OxFFFF, which, combined with the FC.sub.-- PH wording that the Responder destination N.sub.-- Port would use RX.sub.-- ID and ignore the "OX.sub.-- ID." This means that if the Responder had not received the command frame, the RES would be rejected. On the other hand, if the Responder had received the command and sent the FCP.sub.-- RSP response frame, the RES would be rejected, in both cases with the same reason code. Only in the case where the command was in process, but no FCP.sub.-- RSP response frame had been sent by the Responder, would a useful response be sent. Real implementations appear to search for the S.sub.-- ID--OX.sub.-- ID pair when the RX-ID is set to OxFFFF in the RES request. This behavior is necessary for proper network operation.
Further, even if upon implementating change, in the case of a non-transfer command, it is impossible to detect the difference between a command that was never received and a command whose response was lost unless the target retains ESB information for a period of R.sub.-- A.sub.-- TOV after the exchange is closed.
Similar arguments apply to the use of the RSS, though the wording of the applicable section uses the word "may" rather than "would". ABTS, while recommended in FC'-PH for use in polling for sequence delivery, is always interpreted as an abort of the exchange in FC-PLDA, and is therefore not useful for this purpose.