Fibre Channel Arbitrated Loop (FC-AL) architecture is a member of the Fibre Channel family of ANSI standard protocols. FC-AL is typically used for connecting together computer peripherals, in particular disk drives. The FC-AL architecture is described in NCITS working draft proposals, American National Standard for Information Technology “Fibre Channel Arbitrated Loop (FC-AL-2) Revision 7.0”, Apr. 1, 1999 and “Fibre Channel Arbitrated Loop (FC-AL-3) Version 1.0”, 20 Sep. 1999.
Electronic data systems can be interconnected using network communication systems. Area-wide networks and channels are two technologies that have been developed for computer network architectures. Area-wide networks (e.g. LANs and WANs) offer flexibility and relatively large distance capabilities. Channels, such as the Small Computer System Interface (SCSI), have been developed for high performance and reliability. Channels typically use dedicated short-distance connections between computers or between computers and peripherals.
Fibre Channel technology has been developed from optical point-to-point communication of two systems or a system and a subsystem. It has evolved to include electronic (non-optical) implementations and has the ability to connect many devices, including disk drives, in a relatively low-cost manner. This addition to the Fibre Channel specifications is called Fibre Channel Arbitrated Loop (FC-AL).
Fibre Channel technology consists of an integrated set of standards that defines new protocols for flexible information transfer using several interconnection topologies. Fibre Channel technology can be used to connect large amounts of disk storage to a server or cluster of servers. Compared to Small Computer Systems Interface (SCSI), Fibre Channel technology supports greater performance, scalability, availability, and distance for attaching storage systems to network servers.
Fibre Channel Arbitrated Loop (FC-AL) is a loop architecture as opposed to a bus architecture like SCSI. FC-AL is a serial interface, where data and control signals pass along a single path rather than moving in parallel across multiple conductors as is the case with SCSI. Serial interfaces have many advantages including: increased reliability due to point-to-point use in communications; dual-porting capability, so data can be transferred over two independent data paths, enhancing speed and reliability; and simplified cabling and increased connectivity which are important in multi-drive environments. As a direct disk attachment interface, FC-AL has greatly enhanced I/O performance.
The FC-AL interface is sufficiently robust to permit devices to be removed from the loop without interrupting throughput and sacrificing data integrity. If a drive fails, port bypass circuits can quickly route around the problem so all drives on the loop remain accessible.
A typical FC-AL may have one or two host bus adapters (HBA) and a set of six or so disk drive enclosures or drawers, each of which may contain a set of ten to sixteen disk drives. There is a physical cable connection between each enclosure and the HBA in the FC-AL. Also, there is a connection internal to the enclosure or drawer, between the cable connector and each disk drive in the enclosure or drawer, as well as other components within the enclosure or drawer, e.g. SES node (SCSI Enclosure Services node).
In practice, all of these connections are subject to deterioration, which in turn causes a gradual degradation of the FC-AL loop. The sort of problems include, but are not limited to: long-term parametric drift of laser-diodes in optical devices (GBICs); faulty copper connections due to series resistance, excess parallel capacitance, broken wire, random noise; faulty optical connections due to laser-diode with reduced output, broken fibre; other phenomena, corrupted data frame, corrupted control frame, corrupted status frame, corrupted primitives, etc.
The solution to the problem of a deteriorated or faulty component is to replace the component. However, this introduces a new problem: which component to be replaced.
There are simplistic approaches to this problem, e.g. replace all cable connections and see if the error rates have improved sufficiently, but such approaches can be very costly, especially if optical cables are involved, and also may not solve the problem.
The operation of FC-AL involves a number of ports connected such that each port's transmitter is connected to the next port's receiver, and so on, forming a loop. Each port's receiver has an elasticity buffer that captures the incoming FC-AL frame or words and is then used to regenerate the FC-AL word as it is re-transmitted. This buffer exists to deal with slight clocking errors that occur. Each port receives a word, and then transmits that word to the next port, unless the port itself is the destination of that word, in which case it is consumed. The nature of FC-AL is therefore such that each intermediate port between the originating port and the destination port gets to ‘see’ each word as it passes around the FC-AL loop. HBAs such as RAID controllers are attached to loops via ports and may have more than one port, these are referred to as host ports. Disk drive enclosures or drawers are attached to the loop via ports and may have a plurality of ports.
FC-AL architecture may be in the form of a single loop. Often two independent loops are used to connect the same components in the form of dual loops. The aim of these loops is that a single fault should not cause both loops to fail simultaneously. However, some faults, for example in a protocol chip or microprocessor in a disk drive, can cause both loops to fail. More than two loops can also be used.
Data is passed around a FC-AL in units of frames. Frames are made up of words where one word is 4 bytes. A byte in Fibre Channel architecture is logically 10 bits.
FC-AL defines an Extended Link Service (ELS) command that can return a set of optionally implemented counters. These counters detail the number and type of errors detected by that port. They are referred to as the Link Error Status Block (LESB). This is very useful data, and this can be used to determine the ‘area’ of the actual error, or even multiple errors. The customer can then be given specific information as to which cables or enclosures require replacing
Unfortunately, although support for the ELS LESB is required, implementation of the counters is not required. That is to say, that every port must respond to a request for ELS LESB data, but the data returned in the LESB counters need not be valid. As the ELS LESB counter is optional it cannot be relied upon to locate loop faults. It is therefore possible that vital error detection information is not available.
FC-AL requires ports that receive an invalid or corrupt word (known as an Invalid Transmission Word), to replace that word with a Current Fill Word (CFW). By replacing the corrupt word with this valid word (the CFW), no other port will see the corrupt word. The port replacing the corrupt word may optionally increment a LESB counter called the Invalid Transmission Word Count (ITWC). If it does so, then it will be relatively simple for a host bus adapter (HBA) to detect where errors are occurring by looking at each port's ITWC.
The teaching of this disclosure does not relate to using the ITWC to detect the area of the error, as this is the whole purpose of the counter. However, if a port does not increment the ITWC, then locating the fault becomes more difficult.
There is another counter returned in the LESB, called the Invalid CRC (Cyclic Redundancy Check) Count. Several bytes of cyclic redundancy check (CRC) information are transmitted along with each packet of user data. The receiving device then uses the CRC information to check data received and to request a resend if an error is detected.
When a frame of data is sent by a source port such as an HBA, intermediate ports between the host port and the destination port check the number of bits and if there is an error, substitute a CFW in the frame. The destination port checks the CRC count to confirm if the frame is correct. If the frame contains a CFW the CRC count will fail and the receiving port discards the frame.
The invalid CRC count is incremented by any port whenever an invalid CRC is detected. This indicates that there has been some corruption of an inbound data frame, even ones that do not generate Invalid Transmission Words. However, ports do not check a data frame's CRC when the data frame is not destined for that port. This means that to determine the source of errors in this case, averaging of CRC error paths is required. This will then highlight the area where errors are occurring and over time, and with random distribution of connections and data frames, then the source of the error can be more accurately determined.
As the frame containing the CFW is discarded, the destination port will receive less valid bytes than it expected to. This is a “data under run”. The destination port checks the CRC count at a low level operation and counts bytes received and determines if they are less than they should be in an address register at a higher level operation. For example, a sequence number in each frame will detect if a frame is missing.
The aim of this invention is to be able to locate the faulty component or connection, so that it can be replaced and maximum bandwidth achieved and maintained.
In FC-AL architectures, the ITWC may be implemented by all the ports and this provides a determination of fault location. However, drives do not need to implement any of the counters of the LESB. In the case where some or all of the ports do not implement the ITWC, then further analysis is required as provided by the present invention. Similarly, it is an aim of the invention to provide a means of fault determination in other unidirectional loop architectures.