Referring to FIG. 1, a typical data storage system 10 includes at least one rack 12 of storage devices or enclosures 14, 14′ (generally, enclosure 14) having a plurality of disk modules 18. The data storage system 10 can have fewer or more enclosures than those shown (internal or external to the rack 12). Examples of enclosures include disk-array enclosures (DAE) and disk-array processor enclosures (DPE). A typical DAE includes a plurality of disk modules 18 (e.g., fifteen), one or two link control cards (LCCs), and one or two power supplies. A typical DPE includes a plurality of disk modules 18 (e.g., fifteen), one or two storage processors, one or two LCCs, and one or two power supplies. Each disk module 18 includes a carrier assembly that holds a disk drive and slides into the enclosure 14.
The enclosures 14, 14′ implement redundancy with an “A” side and a “B” side. In enclosure 14, for example, each side has a link control card (LCC) 22, 22′ and a power supply (not shown). Reference numerals for the B side components are the same as corresponding components on the A side with the addition of a prime (′) designation. Each LCC 22, 22′ includes a primary communications port 26, 26′ and an expansion communications port 30, 30′. The enclosures 14, 14′ are connected to each other by cables 34, 34′ in a loop topology. Communication signals traverse the loop in one direction and pass from enclosure 14 to enclosure 14′, in a daisy-chain fashion, and then return from enclosure 14′ to enclosure 14. An enclosure receiving communication signals targeted for a different enclosure forwards those signals along the loop.
A common implementation of the loop is a Fibre Channel arbitrated loop. Fibre Channel is a computer communications protocol for communicating signals. In general, the Fibre Channel protocol provides an interface by which host processors 20, 20′ (and servers) communicate with the enclosures 14 and with the disk modules 18 installed within the enclosures 14.
Each LCC 22 of the data storage system 10 typically has port bypass circuitry (PBC) 38 for detecting the presence of valid Fibre Channel encoded serial data on the loop and for asserting a “signal detect” signal when such valid data are detected. When the PBC 38 does not detect valid encoded data, the LCC 22 de-asserts the signal-detect signal. The de-asserted signal-detect signal is, in effect, an asserted “loss-of-sync” signal, which is indicative of failed equipment on the loop, such as a broken or disconnected cable.
To detect failures on the loop, a processor 42 of the LCC 22 executes software that periodically polls the status of the signal-detect signal (or, conversely, the status of the loss-of-sync signal). In general, the frequency of polling is effective to detect hard equipment failures. However, some failures are intermittent, and an asserted loss-of-sync signal can become de-asserted before the next polling occurrence. Thus, the data storage system 10 appears to the processor 42 to be operating properly although it is providing undetected early indications of a failure. Therefore, there remains a need for a system and method that can detect intermittent loop failures and, consequently, early indications of a storage system malfunction.