1. Field of the Invention
The present invention generally relates to multiplex interconnection networks for communication systems, and especially digital communication systems for parallel computers, and, more particularly, to a network which provides a combination of low latency and probability of blockage.
2. Description of the Prior Art
High performance, multi-processor computer systems are characterized by multiple central processor units (CPUs) operating independently, but occasionally communicating with one another or with memory devices when data needs to be exchanged. The CPUs and the memory devices have input/output (I/O) ports which must be selectively connected to exchange data. The data exchanges occur frequently but at random times and occur between random combinations of CPUs and memory devices. Therefore, some kind of switching network is required to connect the ports for the relatively short period of the data exchange. This switching network must provide a high bandwidth so that the processing is not unduly delayed while the data is being exchanged. Furthermore, the connections are frequently made and broken, and delays that occur while waiting for a connection or delays incurred while the connection is being made can also impact the total capability of the parallel CPUs.
FIG. 1 is an illustration of one type of computer system to which the subject invention is directed. There are a large number of CPUs 10, each operating independently and in parallel with each other. In the past, it has been common to have the number N of parallel CPUs to be in the neighborhood of four. However, newer designs involve greater numbers N of CPUs of 256 (2.sup.8) to 1,024 (2.sup.10), or even greater. Each of the CPUs 10 occasionally requires access to one of the several memory devices 12. For the sake of illustration, the memory devices will be assumed to be equivalent and also of number N. Each CPU 10 has an I/O path 14 and each memory device 12 has an I/O path 16. The paths 14 and 16 can be buses and may be duplicated to provide full-duplex communication. The important consideration, however, is that a CPU 10, requiring access to a particular memory device 12, have its I/O path 14 connected to the I/O path 16 of the required memory device 12. This selective connection is performed by a switching network 18, which is central to the design for the distributed processing of the computer system illustrated in FIG. 1.
The use of a cross-point switch for the switching network 18 provides the required high bandwidth. The important feature of a cross-point switch is that it can simultaneously provide N connections from one side to the other, each selectively made. Although the complexity of a cross-point switch increases in proportion to N.sup.2, the relative simplicity of the actual N.sup.2 cross-points allows its fabrication in a currently available technology.
Christos J. Georgiou has described in U.S. Pat. No. 4,605,928 a cross-point switch composed of an array of smaller cross-point switches, each on a separate integrated circuit (IC). Although Georgiou describes a single-sided switch, as opposed to the double-sided switch of FIG. 1, Georgiou's switch can be used in the configuration of FIG. 1, or easily adapted thereto. With the cross-point switch of Georgiou, it is easily conceivable that the number N of ports to the switch can be increased to 1,024. Thus, the total bandwidth of the switch 18 would be 1,024 times the bandwidth of the transmission paths 14 and 16. The cross-point switch of Georgiou has the further advantage of being non-blocking. By non-blocking what is meant is that if a CPU 10 requires that its I/O path 14 be connected to the I/O path 16 of a memory 12 not currently connected, the switch 18 can provide that connection. Thus, a CPU 10 is not blocked by the switch 18 when it requires a connection to a memory device 12.
Georgiou has also described, in U.S. Pat. No. 4,630,045, a controller for his cross-point switch. Georgiou's controller is designed to be very fast but it suffers from the deficiency of most cross-point switches that one controller is used for all N input ports. As a result, the controller must sequentially service multiple ports requesting connection through the cross-point switch. Therefore, once the demanded connection rate exceeds the speed of the controller, the controller becomes a bottleneck. This is because the controller is a shared resource. Even if the controller of Georgiou were redesigned to provide parallel subcontrollers, perhaps attached to each port, then a mechanism or network would have to be introduced to transmit to each such controller the collection of appropriate connection requests. These controllers and the associated networks introduce substantial complexity and delay as described in "Path Hierarchies in Interconnection Networks" by Peter A. Franaszek, IBM J. Res. and Develop., vol. 31, no. 1, January 1987, pp. 120-131.
An alternative to the cross-point switch is the Delta network. Delta networks are defined, with several examples, by Dias et al. in an article entitled "Analysis and Simulation of Buffered Delta Networks", IEEE Transactions on Computers, vol. C-30, no. 4, April 1981, pp. 273-282. Patel also defines a Delta network in "Performance of Processor-Memory Interconnections for Multiprocessors", IEEE Transactions on Computers, vol. C-30, no. 10, October 1981, pp. 771-780. An example of a Delta network for packet switching is described by Szurkowski in an article entitled "The Use of Multi-Stage Switching Networks in the Design of Local Network Packet Switching", 1981 International Conference on Communications, Denver, Colo. (June 14-18, 1981). The Delta network will be described here with reference to the Omega switching network, described by Gottlieb et al. in an article entitled "The NYU Ultracomputer--Designing an MIMD Shared Memory Parallel Computer", IEEE Transactions on Computers, vol. C-32, no. 2, February 1983, pp. 175-189. This example is illustrated in FIG. 2.
In FIG. 2, there are eight ports on the left, identified by binary numbers, and eight ports on the right, similarly identified by binary numbers. Connecting the right hand and the left hand ports are three stages of switches 20. Each switch 20 is a 2.times.2 switch that can selectively connect one of the two inputs on one side to one of the two outputs on the other side. The illustrated Delta network can provide a connection from any port on the right hand side to any port on the left hand side. Data is transmitted from one side to another in relatively small packets containing, in addition to the data, control information, including the address of the desired destination. By use of buffers within the switches 20, it is possible to decouple the switches of the different sections so that the control and transmission are pipelined between the stages of the 2.times.2 switches 20. Thus, the control function of the Delta network is potentially very fast and the delay introduced by the stages rises as a function of logN.
One of the principal goals in the design of an interconnection network for parallel computers is to minimize latency; that is, the time required to traverse the network. In a multi-stage network such as the delta network discussed above, the actual delay is in general a function of the operations to be performed at each stage. Thus, one approach to reducing latency in such networks is to minimize the complexity of the operations to be performed at each stage or intermediate node. For example, if the network has no buffers, then the circuitry associated with buffers and buffer handling at each node is eliminated. This reduces the number of gate delays per stage. Likewise, since the number of logic elements per stage is reduced, more nodes or stages can be placed on a single chip, thus reducing the number of chip crossing delays.
Networks without buffers may be combined with more complex networks into a path hierarchy; that is, the highest level is a bufferless network, followed by a buffered network, and so on. The basic idea of such a hierarchy is that the highest speed connection is made through the bufferless network, but when a collision occurs, the buffered network may be used to increase the chances of the connection being made.
The usual design of bufferless networks is such that there is a single backward link between any two nodes in the network. The effect is that if a message M.sub.ij has traversed a sequence of nodes N.sub.1, N.sub.2, . . . , N.sub.r, then no other message can use the links between any two of these nodes until message M.sub.ij releases this path, either as a result of reaching its destination or due to the message being blocked. A consequence is that the probability of blocking can be high. That is, there may often not be a desired path available through the network because the backward link between two nodes is not available, even though the forward link is.