Multiprocessor computer systems comprise a number of processing element nodes connected together by an interconnect network. Each processing element node includes at least one processing element. The interconnect network transmits packets of information or messages between processing element nodes. Multiprocessor computer systems having up to hundreds or thousands of processing element nodes are typically referred to as massively parallel processing (MPP) systems. In a typical multiprocessor MPP system, every processing element can directly address all of memory, including the memory of another (remote) processing element, without involving the processor at that processing element. Instead of treating processing element-to-remote-memory communications as an I/O operation, reads or writes to another processing element's memory are accomplished in the same manner as reads or writes to the local memory.
In such multiprocessor MPP systems, the infrastructure that supports communications among the various processors greatly affects the performance of the MPP system because of the level of communications required among processors.
Several different topologies have been proposed to interconnect the various processors in such MPP systems, such as rings, stars, meshes, hypercubes, and torus topologies. Regardless of the topology chosen, design goals include a high communication bandwidth, a low inter-node distance, a high network bisection bandwidth and a high degree of fault tolerance.
Inter-node distance is defined as the number of communications links required to connect one node to another node in the network. Topologies are typically specified in terms of the maximum inter-node distance or network diameter: the shortest distance between two nodes that are farthest apart on the network.
Bisection bandwidth is defined as the number of links that would be severed if the network were to be bisected by a plane at a place where the number of links between the two halves is a minimum. In other words, bisection bandwidth is the number of links connecting two halves of the network where the halves are chosen as the two halves connected by the fewest number of links. It is this worst-case bandwidth which can potentially limit system throughput and cause bottlenecks. Therefore, it is a goal of network topologies to maximize bisection bandwidth.
One way to describe bisection bandwidth is in terms of the number of nodes in the network to enable comparison of the relative bisection bandwidths of networks of various sizes. For a network having k nodes, bisection bandwidth is defined in terms of the number of nodes as x*k. For example, as described below, a conventional hypercube has a bisection bandwidth of (1/2)k. This bisection bandwidth remains constant (relative to the number of nodes) at k/2 regardless of the dimension of the conventional hypercube.
Note that it may be more appropriate to define bisection bandwidth as the number of communications links times the bandwidth of each link. However, assuming a constant bandwidth/link regardless of the topology, relative bisection bandwidth comparisons among the topologies can simply be addressed in terms of the number of links. Therefore, as a matter of convention, bisection bandwidth is defined in this document in terms of the number of communications links.
In a torus topology, a ring is formed in each dimension where information can transfer from one node, through all of the nodes in the same dimension and back to the original node. An n-dimensional torus, when connected, creates a n-dimensional matrix of processing elements. A bi-directional n-dimensional torus topology permits travel in both directions of each dimension of the torus. For example, each processing element node in the 3-dimensional torus has communication links in both the + and - directions of the x, y, and z dimensions. Torus networks offer several advantages for network communication, such as increasing the speed of transferring information. Another advantage of the torus network is the ability to avoid bad communication links by sending information the long way around the network. Furthermore, a toroidal interconnect network is also scalable in all n dimensions, and some or all of the dimensions can be scaled by equal or unequal amounts.
In a conventional hypercube network, a plurality of microprocessors are arranged in an n-dimensional cube where the number of nodes k in the network is equal to 2.sup.n. In this network, each node is connected to each other node via a plurality of communications paths. The network diameter, the longest communications path from any one node on the network to any other node, is n-links.
One feature of the conventional hypercube is that the maximum distance between any two nodes (i.e., the diameter) in a hypercube having k nodes is given by log.sub.2 (k). Thus, even as the number of nodes increases, the maximum distance between any two nodes only increases as log.sub.2. As a result, the number of nodes, and hence the number of processors or I/O ports, can be doubled while only requiring a unitary increase in the network diameter between any two nodes. Thus, for each increase in the dimension of the topology (i.e., to expand from an n-dimensional to an (n+1)-dimensional hypercube), an additional edge must be connected to each node. Thus, it is axiomatic that to increase the dimension of a hypercube, each node must have an additional port to support the additional connection. As a result, as the dimension of the hypercube increases the number of ports in each node increases as well.
Another advantage of the conventional hypercube is a high bisection bandwidth. With a conventional hypercube, the bisection bandwidth always remains constant at k/2, for a hypercube having k nodes.
Several extensions or variations of the conventional hypercube have been proposed and/or implemented in multiprocessor systems. One such variation, presented in Louri, et al., Scalable optical hypercube-based interconnection network for massively parallel computing, Applied Optics, Vol. 33, No. 32, Nov. 10, 1994, pp 7588-7598, is the multi-mesh hypercube. One disadvantage of the multi-mesh hypercube over the conventional hypercube is a large network diameter. For a multi-mesh hypercube made up of an l.times.m array of n-dimensional hypercubes, the maximum distance is ((l-1)+(m-1)+n). A second disadvantage of the multi-mesh hypercube topology is a low bisection bandwidth relative to the number of nodes in the network. For a symmetrical mesh (where l=m), the bisection bandwidth is k/4, where k is the number of nodes.
A second variation on the conventional hypercube is presented in Malluhi, et al., The Hierarchical Hypercube: A New Interconnection Topology for Massively Parallel Systems, IEEE Transactions on Parallel and Distributed Systems, vol. 5, No. 1, January 1994, pp. 17-30. According to Malluhi, each of the nodes of an n-dimensional hypercube is itself an n'-dimensional hypercube. This topology has two disadvantages over the conventional hypercube: a lower bisection bandwidth and a greater maximum distance for the same number of nodes.
Consider, for example, a three-dimensional hierarchical hypercube according to Malluhi, where each node is itself a three-dimensional hypercube (i.e., n=3, n'=3). In such a network, there are 64 nodes, and the bisection bandwidth is 4 edges, or k/16. Contrast this to the conventional hypercube having 64 nodes with a bisection bandwidth of 32 edges, or k/2. Also, Malluhi's hierarchical hypercube has a maximum diameter of (n+2n') which, for the n=3 and n'=3 network of the current example yields a maximum diameter of nine edges. For a conventional 64-node hypercube, the maximum diameter is log.sub.2 (64)=six edges.
A third variation of the conventional hypercube is presented in Kumar et al., Extended Hypercube: A Interconnection Network of Hypercubes, IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 1, Jan. 1, 1992, pp. 45-57. According to this extended hypercube topology, network commanders (NC's) are used to connect a plurality of n-dimensional hypercubes. There are 2.sup.n nodes at each n-dimensional hypercube at the first level, one n-dimensional cube of NC's at the second level, and one NC at the third level.
Thus, according to the extended hypercube topology, each node of an n-dimensional hypercube at one level of hierarchy is connected to a single communication processor at the next level of hierarchy. A plurality of n-dimensional hypercubes, each having an NC are connected via the NC's. One disadvantage of this is that each NC provides a single point of failure for its respective hypercube. If the NC fails, the entire hypercube is severed from the network. Another disadvantage is that apparently additional processing capabilities are needed at the NC nodes. It does not appear that conventional routers can be used to implement the NC. Furthermore, each NC requires a large number of edge connections.
Conventional hypercube topology is a very powerful topology that meets many of the system design criteria. However, when used in large systems, the conventional hypercube has some practical limitations. One such limitation is the degree of fanout required for large numbers of processors. As the degree of the hypercube increases, the fanout required for each node increases. As a result, each node becomes costly and requires larger amounts of silicon to implement.
A hierarchical fat hypercube architecture for parallel processing systems is described in U.S. Pat. No. 5,669,008 entitled "Hierarchical Fat Hypercube Architecture for Parallel Processing Systems," issued on Sep. 16, 1997, and which is herein incorporated by reference. In the '008 patent, a scalable hierarchical topology of two or more levels is used. The first level utilizes conventional n-dimensional hypercube topologies for implementing a multiprocessor infrastructure. Each of the n-dimensional hypercubes are interconnected at a second level which can be one of p dimensions, where p does not have to equal n.
The variations on the basic hypercube topology, as noted above, each have their own drawbacks, depending on the size of the network. Some of these topologies suffer from a large network diameter, while others suffer from a low bisection bandwidth. What is needed is a topology that is well suited to applications requiring a large number of processors; is scalable; and provides a high bisection bandwidth, a wide communications bandwidth, and a low network diameter.
Moreover, as systems increase the number of processors, the number of physical connections required to support the hypercube topology increases significantly, resulting in higher system costs and manufacturing complexities. Therefore, it is desired that systems could be scaled to take advantage of more than one type of topology so that smaller systems and larger systems having divergent design goals related to topology architecture could be accommodated in one system design. Such design goals include a desire to optimize system performance while attempting to minimize overall system costs and to minimize manufacturing complexities.