A switching communications network serves to correctly route messages from input ports to output ports. The input and output ports are interconnected by an array of routers, or switches, which direct the messages over interconnections which join the routers. Choices of routing direction at the routers of successive stages determine the overall path of a message through the network. The interconnections may, for example, be wires, optical fibers, multiplexed channels over single wires or fibers, or free space radio or optical communication paths.
A switching network may route any kind of digital or analog data including voice or video signals. A significant application of switching networks is in massively parallel data processing systems where the switching network may interconnect hundreds or even thousands of individual processing elements.
Switching networks can be classified in terms of their routing algorithms as well as in terms of physical layout. Known routing algorithms are generally based on one of four basic styles: packet-switching, circuit-switching, virtual cut-through and wormhole.
In a circuit-switched network, a path is first established from sender to target by a header which indicates the final destination, and reserves the path for the current message. Any adaptive routing decisions are made during this phase. The message is then sent along the path with the tail releasing the reserved channels as it crosses them. There are two approaches to establishing the path. The traditional approach is to determine the path completely before sending any of the data. If congestion is encountered, then either the path is stalled until it can make progress or back-tracking is used to try a different route. The problem with this approach is that all of the reserved channels remain idle until the path is completely established and transmission begins. The second approach optimistically assumes that a path can be established without blocking or back-tracking. Essentially, the path determination and transmission are pipelined: the head determines a path as it goes and the data follows immediately behind. There is little storage in each router, so if the head is unable to make progress the transmission must be aborted.
For packet-switched routing, also called store-and-forward routing, the entire message moves from node to node as a unit. The advantages are that each message uses exactly one channel at a time and that the contents can be verified (and fixed) on each hop. There are two serious disadvantages. First, a message remains in a node until all of the message has entered the node, even if the desired output channel is available. Second, there must be sufficient buffer space on the node for an entire message for every input.
Virtual Cut-Through routing addresses the unnecessary delays imposed by packet switching. In particular, the head of a message never waits for the tail to reach the current node; it simply keeps moving until it encounters congestion or reaches the target. Although this greatly improves throughput and latency, the storage requirements remain the same as packet switching: there must be sufficient storage in each output buffer for the longest message.
Wormhole routing can be viewed as either a variation of virtual cut-through or a variation of circuit switching. It is similar to virtual cut-through except that each node contains a small amount of buffering, not enough for an entire message. This means that messages cover many channels simultaneously. Under zero contention, wormhole routing behaves identically to virtual cut-through. Unlike virtual cut-through, a blocked message ties up many channels (all of the channels it occupies), which causes them to remain idle. The solution to this problem is virtual channels. Each physical channel is multiplexed into several virtual channels, each with their own small buffer. The key point is that messages on different virtual channels can not block each other, even though they use the same wire. Thus, a blocked message only ties up one virtual channel; messages on other virtual channels may pass it, thus making use of the wire and increasing the achievable throughput.
Switching networks using the various routing algorithms may also be classified according to the layout of interconnections between routers. A splitter network, such as a butterfly network illustrated in FIG. 1, is composed of multiple stages of routers 30 organized into splitters. In the illustration of FIG. 1, five stages 22, 23, 24, 25, and 26 route each of 32 inputs 21 to any one of 32 outputs 27. Each router receives one or more interconnections 32 from a prior stage and routes messages received on those input interconnections through alternative output interconnections directed toward subsequent stages. In the example of FIG. 1, each router is a 2.times.2 router. That is, it receives two input messages on two interconnections and routes each message to either of two output interconnections.
It is helpful to view routing of a message through a splitter network as a sorting function through equivalence classes of fewer and fewer routers. Thus, from one equivalence class of a number of routers, each message may be routed to a subsequent router within each of two or more equivalence classes in a subsequent stage. Thus, in FIG. 1, the equivalence class of all 32 routers in stage 22 routes to upper and lower equivalence classes of 16 routers each in stage 23. Each of those equivalence classes then routes to equivalence classes of eight routers each in stage 24. The number of directions among which a router selects, and thus the number of equivalence groups in the subsequent stage, is the radix r of the routers. Where s is the number of stages in a network, for the i.sup.th stage there are r.sup.i equivalence classes, each with r.sup.s-i routers. An individual splitter consists of an equivalence class of routers and its r associated equivalence classes of routers in the next stage.
The butterfly of FIG. 1 can be seen to be a single path network. That is, from any one input to any one output, there is only one path through the routers and interconnections. A disadvantage of single path splitter networks such as the butterfly is that router faults and congestion significantly affect the performance of the network. To overcome this problem, switching networks with multiplicity have been studied. A network has multiplicity if it has some routers with redundant interconnections in some routing directions. The result is a multipath network in which multiple paths are available between specific inputs and outputs. A particularly useful class of networks with multiplicity is that of expander-based networks, such as multibutterflies.
A bipartite graph with M inputs and N outputs is an (.alpha.,.beta.,M,N)-expander if every set of m.ltoreq..alpha.M inputs reaches at least .beta.m outputs, where .beta.&gt;1 and .alpha.&lt;1/(r.beta.). For a radix-r splitter network to have expansion, each splitter must achieve expansion in each of the r directions. To achieve expansion, a splitter network must have routers with redundant connections in each of its r directions. We refer to this redundancy, d, as the multiplicity. The degree of any node in the splitter is then dr. Further, within the equivalence classes to which the interconnections are directed, the interconnections are preferably connected at random. Although a true expander must have .beta.&gt;1 in each splitter, it is generally sufficient in practice that .beta. be greater than one across multiple adjacent splitters. Every set of m inputs, m.ltoreq..alpha.M, reaches at least .beta.m outputs in each of r.sup.i equivalence classes, where i is the number of stages spanned by the multiple adjacent splitters, .alpha.&lt;1/(r.sup.i .beta.) and .beta.&gt;1. Such pseudo-expanders of shall be included within the term expanders in the following description and claims.
A multibutterfly is an example of a splitter network with expansion. (See Arora et al., U.S. patent application Ser. No. 08/218,318 filed Mar. 25, 1994, which is a continuation of Ser. No. 07/732,031 filed Jul. 18, 1991 now abandoned.) In particular, each M-input splitter of a multibutterfly is an (.alpha.,.beta.,M,.sup.M /r) expander in each of the r directions. A 32 input example of a multibutterfly network is presented in FIG. 2. Note that the number of interconnections 32' directed to each next equivalence class is doubled to provide two interconnections from each router to each next equivalence class, thus requiring at least 4.times.4 switches. Also, within equivalence classes the interconnections are made at random.
Recently, numerous results have been discovered that indicate that multibutterflies are ideally suited for message-routing applications. Among other things, multibutterflies can solve any one-to-one packet routing, circuit-switching, or non-blocking routing problem in optimal time, even if many of the routers in the network are faulty. No other networks are known to be as powerful.
The reason behind the power of multibutterflies is that expansion roughly implies that .beta.p outputs must be blocked or faulty for p inputs to be blocked, and thus it takes .beta..sup.j faults to block one input j levels back. In contrast, one fault in a radix-2 butterfly blocks 2.sup.j inputs j levels back. As a consequence, problems with faults and congestion that destroy the performance of traditional networks can be easily overcome in multibutterflies. For a survey of the research on multibutterflies see Pippenger, "Self-routing superconcentrators," in 25th Annual ACM Symposium on the Theory of Computing, pages 355-361, ACM, May 1993, and Leighton and Maggs, "Fast algorithms for routing around faults in multibutterflies and randomly-wired splitter networks," IEEE Transactions on Computers, 41(5):1-10, May 1992.
As noted, multibutterflies are generally constructed by randomly wiring redundant connections between equivalence classes of each splitter. Although deterministic constructions are known and may be used, none are known to produce expansion comparable to random wiring.
Unfortunately, random wiring and the known deterministic constructions of good expanders scale poorly in practice. For example, a 4K-endpoint machine with multiplicity d=2 has 8K wires in the first stage, almost all of which would be long cables with distinct logical endpoints. For comparison, a fat-tree, another multipath network, might have a similar number of cables for the root node, but there are few logical endpoints, so huge groups of wires can be routed together. The groups connect to many boards, but the boards are located together and the connection of cables to boards is arbitrary and thus require only low labor. In the multibutterfly, the cables cannot be grouped and the connection of cables to boards is constrained.
Indeed, given a splitter with M boards of input routers, M boards of output routers, and b routers per board, we can expect each board to be connected to about min(M,dbr) other boards when using random wiring. For typical values of M, d, b, and r, this means that we would need to connect every input board to every output board in a randomly wired splitter. Clearly, this becomes infeasible as M gets large, and thus the randomly wired multibutterfly does not scale well in the practical setting where the network consists of boards of chips. A similar problem arises at the level of cabinets of boards for very large machines.