Many computer-related systems now include redundant components for high reliability and availability. Nonetheless, the failure or impending failure of a component may still affect the performance of other components or of the system as a whole. For example, in a RAID storage system, an enclosure includes an array of hard disk drives (HDDs) which are each coupled through independent ports to both of a pair of redundant disk array switches. One of a pair of redundant sub-processors is coupled to one of the switches while the other of the pair is coupled to the other switch. Alternatively, a single sub-processor is coupled to both switches and logically partitioned into two images, each logically coupled to one of the switches. Each switch is also coupled through a fabric or network to both of a pair of redundant RAID adapters external to the enclosure. The system may include additional enclosures, each coupled in daisy-chain fashion in the network to the disk array switches of the previous enclosure.
If the system is fibre channel-arbitrated loop (FC-AL) architecture, when the system is initialized, either or both RAID adapters (collectively referred to as “adapter”) performs a discovery operation using a “pseudo-loop” through the switches. During discovery, the addresses of all of the devices on the network are determined. The system then enters its normal switched mode. However, if a drive becomes faulty during normal system operations, it may repeatedly enter and exit the network, each time causing the adapter to enter the discovery mode again, resulting in system-wide disruption.
If diagnostics are performed on the suspected faulty drive, the system is further disrupted. While it is possible to isolate the suspected faulty drive by by-passing the ports through which it is coupled to the switches, effectively removing the drive from the network, the drive is then inaccessible for diagnostic operations to be performed on it.
Consequently, a need remains to be able to perform diagnostic operations on a drive without disrupting access to the rest of the disk array or to the network.