1. Field of the Invention
The invention relates generally to computer systems and more particularly to systems and methods for performing error recovery following failure of a redundant link connecting a sequential device to a host.
2. Background of the Invention
In a stand-alone computer, data is stored in a device such as a hard disk drive. This device is normally internal to the computer and is connected to the CPU by an internal (e.g., PCI) bus. Data delivery on the internal bus is, for the most part, error-free.
In a network environment, however, the data generated by a workstation may be stored on a remote device. In other words, the data storage device is coupled to the workstation by a network that is external to the workstation and is typically coupled to and used by a number of devices other than the workstation and the storage device.
A network is generally more prone to errors than an internal bus since it deals with multiple devices that are contending for use of the network and that are separated by greater distances. Even if there is only a single workstation and a single storage device, there is likely to be a higher error rate than in an internal bus since the network is designed to operate with other devices.
An exemplary system may comprise a workstation (a host) coupled to a storage device in a SAN (Storage Area Network). In this example, the workstation and storage device are each coupled to a Fibre Channel switched fabric. This switched fabric is designed to provide what is referred to as “Class 3” service, which is the class of service typically used in Fibre Channel Protocols. In a network providing Class 3 service, data is multiplexed across switches in the network at frame (packet) boundaries. Class 3 service does not provide for acknowledgement of receipt of frames or notification of a busy destination device. If a frame is dropped, no notice of the dropped frame is provided to its sender. It is simply assumed that this will be accounted for by the host or the sequential device.
This may not be a problem when the data is being stored to a random access device such as a hard disk drive. If data is dropped during a write command, typically the entire sequence is discarded. The sequence can be re-sent, however, by re-issuing the failed SCSI command, and the storage device will simply overwrite any location in which the received portion was previously stored. The portion that was previously stored will be overwritten with the same data, and the portion that was previously dropped will be written to the location where it would have been stored if it had been received on the first transmission.
For sequential storage devices such as tape drives, however, errors may be more problematic. In the same scenario, the portion of the data received on the first attempt will be stored on the tape, and the tape will continue advancing. If the entire sequence is re-sent, the tape drive must be repositioned to the exact same location on the tape at which the data was previously stored. It would be preferable if only the portion of the data that was not received on the first attempt was re-sent. Then, the tape drive could simply continue writing to the tape from the point at which it left off.
It is very difficult in prior art systems, however, to determine which portion of the data was not received on the first attempt. The host device would have to extract information from the port over which the data was first transmitted. The host would then have to interpret the information and convert it into a sequence suitable for transport via the second port. It would therefore be desirable to provide means and/or methods for determining what portion of the data was received on the first attempt without having to extract this information from the first port. Then, the host could deliver only the missing portion of the data to the sequential device, which could then store it after the previously received information.
Tape error recovery in the prior art involves propagation of an error up to the level of the backup application. The result was that the backup application would require a substantial amount of time to perform recovery operations. Alternatively, Fibre Channel Tape Error Recovery as defined in the FC-TAPE and FC-DA technical reports provides a means for hiding errors arising from the loss of data from the backup application.