Parallel computing systems consist of a plurality of processors that communicate via an interconnection network. One popular network for providing the interconnection for a plurality of processors is the circuit-switched network comprised of multiple circuit switches. The state-of-the-art unbuffered circuit switch is the ALLNODE Switch (Asynchronous, Low Latency, inter-NODE switch), which is disclosed in U.S. Ser. No. 07/677,543 and continued as U.S. Ser. No. 08/457,789 filed Jun. 2, 1995. The Allnode switch as disclosed in U.S. Ser. No. 07/677,543 (now continued as U.S. Ser. No. 08/457,789) provides excellent low latency characteristics because it implements a minimum amount of circuitry at each switch stage of a multi-stage interconnection network. The latency across the switch is extremely fast because the equivalent of a straight wire connection is provided across each switch stage. The Allnode Switch supports a totally asynchronous transmission that does not require relatching or buffering at the individual switch elements. Therefore, the Allnode Switch delivers data messages being transmitted through the switch as quickly as possible avoiding the delays of any buffering.
As the field of parallel processing advances, the need for better preforming interconnection networks comprised of multiple stages becomes of prime importance. To date, one of the highest performing circuit switch networks has been described in U.S. Ser. No. 07/799,497, Filed Nov. 27, 1991 and continued as U.S. Ser. No. 08/216,789 filed Mar. 23, 1994, and continued as U.S. Ser. No. 08/606,232 filed Feb. 23, 1996, "Multi-Function Network" by H. T. Olnowich et al. The said network uses multiple paths through the network, called alternate paths, and searches for an open path to make a network connection. The said network uses the "Dual Priority Switching Apparatus for Simplex Networks" described by H. T. Olnowich et al. in U.S. Ser. No. 07/799,262 and continued as U.S. Ser. No. 08/318,578 filed Oct. 5, 1994, which is a two mode switch capable of performing two different switching modes based on the presence of different types of traffic patterns in the network. The first mode causes connections in the network to be broken if "cold" or random traffic encounters blockage in the network, and then path establishment is retried over a different alternate path in the network as controlled by the node trying to establish the connection. The second mode causes traffic into the network which has been classified as "hot" traffic to experience a different network capability of camp-on (previously won connections in the network are not broken when hot spot congestion is experienced in the network). In the camp-on mode, the request for a connection is placed into a priority queue at the switch experiencing blockage and serviced as soon as the blockage dissapates on a fairness basis to prevent the starvation of any node encountering a hot spot.
The existing methods have some disadvantages. The alternate paths are chosen randomly and blindly by the sending nodes and their network adapters before entering the network. This approach leads to choosing a fixed path to be tried. If the fixed path is blocked for random traffic, the network adapter picks another path blindly and tries again to establish connection. The blind selection problem is solved by the co-pending application, U.S. Ser. No. 07/947,023 now issued as U.S. Pat. No. 5,345,229, "Adaptive Switching Apparatus" by H. T. Olnowich etal. The adaptive switch permits each switching element to determine for itself which of several optional alternate paths to try at each stage of the network based on availability. This is a better approach because it brings the decision directly to the switching apparatus involved, which has the data required to make an intelligent decision, as opposed to being commanded blindly from a distance.
A further problem exists in multi-stage networks, especially in circuit switched multi-stage networks, where a path must be won simultaneously at every stage through the entire network. This is necessary in order to establish a direct circuit connection through the network for sending a data message from a sending node to a receiving node. In traditional circuit switched networks experiencing heavy loading, it becomes difficult to win resources across the entire network. In IBM DOCUMENT NO. AAA92A000597, "Experimental Comparison of Multistage Interconnection Networks", published in November 1991 by S. Konstantinidou and E. Upfal extensive simulation results comparing the routing performance of various multistage communication networks. These simulations of traditional networks have shown that such networks tend to clog at about a 20% loading factor. The clogging has been shown to occur mainly at the later stages of the network.
The clogging at later stages presents a problem because this is the worst place for networks to clog. As a transfer starts into the network, it must win each network stage in succession. It wins the first stage and then tries to win the second stage while holding and tying up the resources at the first stage. After winning the second stage, it tries to win the third stage while tying up the resources at the first and second stages. Thus, the establishment of a connection works its way through the network forming a path to the desired receiving node. If other connections want to use a resource being held, they cannot and this effect adds to the clogging of the network. As the path being established works its way deeper into the network, it ties up more resources, and causes more clogging. Worse yet, if the path progresses to a later network stage and finds that it is blocked there, the attempt is classified as unsuccessful and the whole path is torn down and another path is tried. The result being that all of the resources being tied up by the unsuccessful attempt, caused clogging to no avail. The whole effort was not only wasted but it caused additional clogging. The additional clogging is proportional to the time that the resources are tied up in relation to an unsuccessful attempt. The further into the network that the blocking is encountered, the more clogging it generates. Thus, the problem is that existing networks with equal or higher blocking characteristics in the later stages provide just the opposite effect of what is needed for better performance. What is needed is a network with an increasing probability of success in the later stages.
If blocking in the network is to occur, it would be better to have it occur in the early stages. Then a path could start to be established, encounter immediately its highest blocking probabilities, and back off quickly if it encounters blocking. This way resources that cannot be used are not reserved for very long.
Other interesting work towards solving the blocking problem is reported in IBM DOCUMENT NO. AAA91A005191, "The Multibutterfly Communication Scheme", published in the "Proceedings of the IBM Interdivisional Parallel Processing Symposium" by S. Konstantinidou and E. Upfal, held in Gaithersburg, Md. on August 1991, p93. Eulal et al. present a novel communication scheme, based on an original network topology. The Multibutterfly network is sparse, symmetric and scalable. Its topology combines the useful properties of the Butterfly network, commonly used today in parallel computers, with the high connectivity property of a special type of sparse expander graphs. The Multibutterfly communication scheme uses a simple deterministic routing algorithm. The routing algorithm requires only small buffers is each node, and it is easy to implement. The present invention builds on the multibutterfly concept by introducing the use of a new switching element, changing the routing algorithm, and eliminating the buffers to provide an even better network.
Often systems require multiple paths through the switching networks to perform different functions. An earlier work at IBM by Peter Franaszek, as described in his work entitled "Multipath Hierarchies in Interconnection Networks" described two hierarchical paths for a network, one providing low-latency message transfer and the other providing guaranteed-delivery of a message transfer and the other providing guaranteed-delivery of a message at a longer latency. A message is attempted over the low-latency path first. If the transmission fails due to blocking or contention, it is retransmitted over the guaranteed-delivery path. This allows usually about 90% of the messages to be sent successfully over the low-latency path, and guarantees the delivery of a message that gets blocked on the low-latency path due to retransmissions.
U.S. Pat. No. 4,952,930 to P. A. Franaszek et al. issued Aug. 28, 1990 described the approach which used a second buffered path, which is in some ways similar to the current approach. However, it suffered by its requirements of a plurality of switches to implement it. While there would be no impediment to our adopting the teachings of this patent there remained a need for a simpler and yet more flexible approach to create a multi-stage network.
Multi-stage networks have become an accepted means for interconnecting multiple devices within a computer system. They are a replacement for the traditional crossbar interconnection. The crossbar is still a most efficient method of network interconnection, but it tends to be impractical for large systems. An N.times.M crossbar permits total simultaneous interconnection, where all the N devices can be communicating simultaneously with different members of the set of M devices. The crossbar is "non-blocking" because their is nothing internal to the crossbar which prevents any given N device from connecting to an M device which is IDLE (is not connected to some other N device). If an N device desires to connect to an M device which is BUSY (previously connected to some other N device), no connection can be made until the previous connection is broken--however, this is referred to as "contention" and is not called "blocking".
When N and M become large (usually greater than 32 or 64) it becomes very unwieldy to build crossbars since there complexity increases at an N.times.M rate and their pin count increases at an (N.times.M).times.W rate, where W=the number of pins per port. Thus large networks are usually built from multi-stage networks constructed by cascading several stages of smaller crossbars together to provide an expanded network. The disadvantage of multi-stage networks is that they are "blocking", i.e., a connection might not be able to be made to an IDLE M device because there is no path available in the network to provide the necessary connection to the IDLE device.
Among other patents which might be reviewed are: U.S. Pat. No. 4,914,571 to A. E. Baratz et al. issued Apr. 3, 1990 which describes a method of addressing and thus how to find resources attached to a network, but does not deal with the hardware for the actual network itself.
U.S. Pat. No. 4,455,605 to R. L. Cormier et al. issued Jun. 19, 1984 which is for a bus oriented system, it is not a multi-stage network. Similarly, U.S. Pat. No. 4,396,984 to E. R. Videki, II issued Aug. 2, 1983 is for an I/O bus channel, not a multi-stage network. U.S. Pat. No. 4,570,261 to J. W. Maher issued Feb. 11, 1986 is for fault recovery over a bus oriented system, not a multi-stage network.
U.S. Pat. No. 4,207,609 to F. A. Luiz et al. issued Jun. 10, 1980 illustrates an I/O bus channel so that those in the art will understand the differences between the subject matter. It is not a multi-stage network.
U.S. Pat. No. 4,873,517 to A. E. Baratz et al. issued Oct. 10, 1989 is for a totally different type of network, not an equi-distant multi-stage network like that which we will describe, and also, U.S. Pat. No. 4,932,021 to T. S. Moody issued Jun. 5, 1990 for bus wiring paths inside a computer box, it is not a multi-stage network. U.S. Pat. No. 4,733,391 to R. J. Godbold et al. issued Mar. 22, 1988 illustrates a ring interconnection network, which is unlike a multi-stage network. U.S. Pat. No. 4,811,201 to B. R. Rau et al. issued Mar. 7, 1989 are not applicable to a multi-stage network. U.S. Pat. No. 4,754,395 to B. P Weisshaar et al. issued Jun. 28, 1988 is for a ring interconnection network.
The present invention is a modification and adaption of the multibutterfly network as disclosed in IBM DOCUMENT NO. AAA91A005191, which uses the circuit-switch disclosed in the co-pending application, U.S. Ser. No. 07/947,023 now issued as U.S. Pat. No. 5,345,229, "Adaptive Switching Apparatus" by H. T. Olnowich etal. The adaptive switch is an improvement of the Allnode switch concept as disclosed the parent application, U.S. Ser. No. 07/677,543 continued as U.S. Ser. No. 08/457,789 filed Jun. 2, 1995. We have solved some of the problems encountered in the prior art and will describe a way whereby some of the traditional blocking problems in multi-stage networks are circumvented.