As distributed systems continue to increase in size and capacity through the introduction of additional equipments or additional interconnections between equipments or components, the need to manage and determine the status of the distributed system becomes increasingly more difficult and complex.
In distributed systems, such as storage area networks (SANs), a plurality of data providers/users are typically connected through a FibreChannel switch fabric to a plurality of storage devices. The FibreChannel switch fabric typically includes a plurality of switches that allow any data provider to supply data to one or more storage devices and similarly enables a storage device to provide stored data to each of the data users. The providers/users have no explicit knowledge of which of the storage devices to which the data is stored or provided. As additional storage devices are added to the plurality of storage devices or new switches are added to the fabric, the interconnections within the switch fabric are updated to accommodate the new equipments. FIG. 1 illustrates a conventional Storage Area Network which illustrates a plurality of hosts 110 in communication with a plurality of switches 120 which provide access to a network 130. The network represents an internal or external communication network (e.g., private or public internet) or may represent a switch fabric. In essence, the specific communication paths among the elements of the network are unknown to a user (host). Accordingly, information provided to the network 130 from host 110 may be stored on selected storage devices 150 via switches 140. The data may, for example, include header information that identifies the specific storage device to which data is stored or retrieved.
Because of the number of interconnections (paths, pathlinks) between the switches and the external data providers/users and the storage area network devices, there is an inherent redundancy in the SAN as data may be provided to any one of the storage devices that are indicated to be in an operational state. That is, if one particular storage device fails by losing power or component breakdown, a SAN controller may receive an indication of the non-operational status may declare the device failed and direct subsequent requests for storing data to another storage device. Similarly, when a switch fails, the SAN controller may declare the device failed and direct subsequent requests to the switch to another switch.
However, when a connection between a host and a storage device fails due to a link or switch failure in the SAN, there is no explicit indication of such a failure. One way to determine whether connections have failed, and provide subsequent re-routing information, is to test each connection by physically transmitting signal and determining a reply. This method is very time consuming and expensive as the test signals must be transmitted without interfering with the underlying operation of the SAN.
Hence, there is a need in the industry for a method and apparatus for determining the interconnections among a plurality of equipments and utilizing this information to determine connectivity between equipments or components in the distributed system.