1. Field of the Invention
The present invention pertains to computer networks and particularly to a method and apparatus for deadlock free routing around a failed routing component.
2. Background
Throughout the evolution of computer technology the focus has been on increasing the power and speed of single autonomous processors. This has led to today's stand-alone processors which have respectable computational and data processing power. However, in recent years, parallel processing computer architectures have demonstrated substantial performance advantages over more traditional sequential computer architectures. In particular, as computational needs have grown and the performance advantages offered by parallel processing have come to be recognized, multiple processing units have been coupled together to form computer system networks. For example, multiple processing units may be networked together as a local area network (LAN) that allows individuals with personal computer systems to communicate with one another. Alternatively, a "supercomputer" may be formed by coupling a number of processing units together via a high performance network which enables the processing units to operate together to perform various tasks. These supercomputers are beneficial to users because they provide an extremely fast, powerful, and cost-effective system to carry out users' requests.
Unfortunately, as the number of components in the computer system network increases, the probability that a component will become faulty during system operation also increases. A network with thousands of components has a relatively low mean time between failure for the system components. That is, there is a relatively high probability that one component within the network will fail within a given period of time (for example, one failure per week). In order to be useful to the user(s), the network should be able to resolve these component failures. A system that shuts itself down upon detecting a faulty component and cannot re-start until the component is repaired or replaced reduces the availability of the system and increases the inconvenience to the users. Thus, it would be beneficial to provide a system which is able to automatically bypass faulty network components.
Furthermore, many users have neither the technical expertise nor the desire to resolve a component failure in the network by indicating to the network how to route around the faulty component. In addition, performing such a correction could be very time-consuming, and distracts the user from his or her other responsibilities. Thus, it would be beneficial to provide a system which resolves the failure of a component in a manner which is transparent to the system user(s).
In addition, depending on the layout of a network, a faulty component could cut off multiple good components from the remainder of the network. Depending on the type of network, this could mean that some computers could not communicate with others, or that certain processing units would not be available to the system user, even though they are in good working condition. Thus, it would be beneficial to provide a system which reduces the number of good components which are disconnected from the remainder of the system by a faulty component.
Additionally, network systems typically have a significant number of processing and routing components, and corresponding hardware complexity. Such additional numbers and complexity further increases the relative mean time between failure of the system. Thus, it would be beneficial to provide a system which effectively routes around faulty components while at the same time provides little increase in hardware complexity as well as little decrease in system performance.
Furthermore, network systems should effectively resolve "deadlock" situations. A deadlock situation occurs when one or more components within the network cannot advance in their operation due to resources within the system which the component(s) requires being unavailable. One possible occurrence of a deadlock situation is dependent on the routing technique utilized in the system. In one routing technique, referred to as "circuit switching," a source node sends control information for a packet through its intended path to a destination node in the network to reserve each link in the path. Once the entire path is reserved, the source node transfers the data along the reserved path to the destination node.
In another routing technique, referred to as "wormhole routing," the source node sends the necessary control information through its intended path to the destination node, followed immediately by the data. That is, the source node does not wait for the entire path to be reserved prior to beginning transfer of the data. In both of these routing techniques, the data packet maintains reservation of portions of the path already reserved while waiting for subsequent portions to be reserved. Thus, a deadlock situation may arise when, for example, two or more source nodes are attempting to transfer data to one or more destination nodes and none can advance because another is blocking a portion of the data path required by the other. Thus, in order to provide continued performance of a network system, such deadlock issues need to be resolved.
As will be described in more detail below, the present invention provides for a method and apparatus that achieves these and other desired results which will be apparent to those skilled in the art from the description to follow.