1. Field of the Invention
This invention is related to the field of fault isolation in networks of devices.
2. Description of the Related Art
Networks generally connect a set of devices (or nodes). For example, various computer networks (e.g. local area networks (LANs), wide area networks (WANs), metropolitan area networks (WANs), wireless networks, intranets, extranets, the Internet, etc.) connect various computer systems. Storage area networks (SANs) connect various storage devices to a set of hosts.
The devices in a network are prone to faults of various kinds. Furthermore, a given fault in one device may cause other devices that have some sort of relationship to the device to also experience a fault. Still further, an environmental event or other factor (e.g. temperature in a room in which a device or devices are located, or power supply to the devices) may cause faults to occur. Isolating the many faults, correlating the faults, and determine the root cause of the faults is often a complicated task.
Some approaches to the problem of fault isolation rely on the devices to report faults to a monitor. However, there may be faults which the device cannot report (e.g. the severity of the fault may prevent the device from communicating, such as the power to the device going off).
Other approaches to the problem of fault isolation actively contact the devices in the network to attempt to detect devices that have experienced a fault. However, such approaches do not scale well to large numbers of devices. For example, if a device experiences a fault soon after being contacted, the fault will not be detected until the next time that device is contacted.