The design of multiprocessor computers with a large number of parallel processing elements requires fast and efficient communication throughout the multiprocessor network. The largest bottleneck in the processing ability of large multiprocessor systems is the interprocessor communication. There exists in the prior art a wide variety of interprocessor communication network topologies which yield various levels of performance. The trend is to build large system configurations of multiprocessors operating in parallel in the range of thousands of processors.
In a multiprocessor or multicomputer design (the present discussion centers on topology structure so no differentiation is made between multiprocessors and multicomputers) using thousands of parallel processing elements, the physical packaging of the hardware becomes a major concern to interprocessor communication. For example, the preferred cost-effective method of implementing a large parallel processor is to use microprocessor packaged in individual VLSI packages with a small number of memory and other support chips for each processor. The number of other microprocessor packages that a single package can communicate with depends on the number of available input/output (I/O) ports available. For the fastest direct communication between two microprocessors, dedicated pins on the LSI package are used to form the communication interconnect. Thus, the number of communications ports that a microprocessor can support is directly dependent on the pin limitations of the packages. LSI packages of the prior art have been built with more than 300 pins, however, in multiprocessor designs using tens of thousands of processors, pin limitations are still a factor limiting the size of the design even if the I/O ports are multiplexed on shared pins. This is a very important factor as to why a single processor element in a multiprocessor design cannot have a large number of ports.
Interprocessor communication can be divided into two main categories of topologies, commonly entitled direct and indirect. The direct network connection topology has been widely researched and considered for telephone switching connections. These connections are typically found to be tightly coupled, slow-speed communications networks designed either as single-stage or multi-stage designs. The most commonly used direct interconnect topology is the crossbar interconnect widely used in older telephone exchange central offices. When the number of nodes increases in a direct interconnection topology, the number of possible interconnections grow very quickly. These types of prior art direct interconnection techniques are very expensive for loosely-coupled or distributed multiprocessor designs with a very large number of processors.
The indirect interconnection topology is typically described in terms of the physical layout of the interconnected nodes of the system. These interconnect schemes are usually described in terms of the degree required for implementation, i.e., degree one, degree two, degree three, hypercube, etc. The most promise for very large multiprocessor system interconnect topologies has been found in the hypercube topology.
Several prior art multiprocessors have been designed based upon the hypercube topology. Exemplary of these types of multiprocessors is the NCUBE/10 parallel processing computer manufactured by NCUBE Corporation of Beaverton, Oregon. This multiprocessor system uses a 10-dimensional hypercube, or 10-cube, when fully configured. The NCUBE/10 machine can operate using 1,024 microprocessors operating in parallel providing an overall performance of upwards of 500 million floating point operations per second (MLFOPS) or 2,000 million integer instructions per second (MIPS). However, a multiprocessor of this capability is still limited in its ability to grow to a larger size due to I/O limitations between the microprocessors.
Several variations on multi-degree topologies from multiprocessors have been proposed in the prior art to reduce the longest path between microprocessors and the number of interconnections between microprocessors in order to improve performance and to allow the number of microprocessors to grow larger. These proposals for multiprocessor interconnection topologies are typically shown in the form of a graph in which nodes represent switching points or processing elements and edges represent communication links. Since the topologies tend to be regular, the descriptions lend themselves to graphical displays representing systems such as the types shown in the Figures attached to the present patent application. Those skilled in the art readily recognize the conversion of graphical representations of system topologies into hardware. Hence, this shorthand notation is a convenient method of representing larger and more complex hardware multiprocessor systems without the associated complexity of unnecessary details.
To best understand the prior art of interconnection topologies for multiprocessor systems, the present patent application includes a detailed discussion of the prior art to more carefully place the present invention in light of its advancements over the prior art.