1. Field of the Invention
The present invention pertains to computer networks. More particularly, this invention relates to improving the ability of a network to route around faulty components.
2. Background
Modern computer technology is advancing at a very fast rate and has resulted in high-performance computing components being made available in smaller and smaller packages. These small, high-performance components are finding expanded uses in a wide range of personal, business and academic fields.
One use of these high-performance components is in network systems. In a network system, multiple processing units are coupled together to perform various programmed tasks. For example, the processing units may be networked together as a local area network (LAN) in an office building to allow individuals with personal computer systems in the building to communicate with one another. Such network systems are beneficial to users because they allow the users to communicate with each other, such as by electronic mail or transferring data files between one another. Or, by way of another example, a "supercomputer" may contain multiple processing units which are coupled together via a high-performance network and which operate together to perform various programmed tasks. These supercomputers are beneficial to users because they provide an extremely fast, powerful and cost-effective system to carry out users' requests.
However, one disadvantage of network systems is that the greater the number of components in the system, the greater the chances that a component will become faulty during system operation. A network with thousands of components has a relatively low mean time between failure for the system components. That is, there is a relatively high probability that one component within the network will fail within a given period of time (for example, one failure per week). In order to be useful to the user(s), the network should be able to resolve these component failures. A system which shuts itself down upon detecting a faulty component and cannot re-start until the component is repaired or replaced reduces the availability of the system and increases the inconvenience to the users. Thus, it would be beneficial to provide a system which is able to automatically bypass faulty network components.
Furthermore, many users have neither the expertise nor the desire to resolve a component failure in the network by indicating to the network how to route around the faulty component. Many users do not have the technical expertise required to perform such a correction. Furthermore, performing such a correction could be very time-consuming, and distracts the user from his or her other responsibilities. Thus, it would be beneficial to provide a system which resolves the failure of a component in a manner which is transparent to the system user(s).
In addition, depending on the layout of a network, a faulty component could cut off multiple good components from the remainder of the network. Depending on the type of network, this could mean that some personal computers could not communicate with others, or that certain processing units would not be available to the system user, even though they are in good working condition. Thus, it would be beneficial to provide a system which reduces the number of good components which are disconnected from the remainder of the system by a faulty component.
Additionally, network systems should effectively resolve "deadlock" situations. A deadlock situation occurs when one or more components within the network cannot advance in their operation due to resources within the system which the component(s) requires being unavailable. The occurrence of a deadlock situation is dependent on the routing technique utilized in the system. In one routing technique, referred to as "circuit switching," a source node sends control information for a packet through its intended path to a destination node in the network to reserve each link in the path. Once the entire path is reserved, the source node transfers the data along the reserved path to the destination node. In another routing technique, referred to as "wormhole routing," the source node sends the necessary control information through its intended path to the destination node, followed immediately by the data. That is, the source node does not wait for the entire path to be reserved prior to beginning transfer of the data. In both of these routing techniques, the data packet maintains reservation of portions of the path already reserved while waiting for subsequent portions to be reserved. Thus, a deadlock situation may arise when, for example, two or more source nodes are attempting to transfer data to one or more destination nodes and none can advance because the other is blocking a portion of the data path required by the other. Thus, in order to provide continued performance of a network system, such deadlock issues need to be resolved.
The present invention provides for these and other advantageous results.