Various high-speed computer processing systems, sometimes referred to as supercomputers, have been developed to solve a variety of computationally intensive applications, such as weather modeling, structural analysis, fluid dynamics, computational physics, nuclear engineering, realtime simulation, signal processing, etc. The overall design or architectures for such present supercomputers can be generally classified into one of two broad categories: minimally parallel processing systems and massively parallel processing systems.
The minimally parallel class of supercomputers includes both uniprocessors and shared memory multiprocessors. A uniprocessor is a very high-speed processor that utilizes multiple functional elements, vector processing, pipeline and look-ahead techniques to increase the computational speed of the single processor. Shared-memory multiprocessors are comprised of a small number of high-speed processors (typically two, four or eight) that are lightly-coupled to each other and to a common shared-memory using either a bus-connected or direct-connected architecture.
At the opposite end of the spectrum, the massively parallel class of supercomputers includes both array processors and distributed-memory multicomputers. Array processors generally consist of a very large array of single-bit or small processors that operate in a single-instruction-multiple-data (SIMD) mode, as used for example in signal or image processing. Distributed-memory multicomputers also have a very large number of computers (typically 1024 or more) that are loosely-coupled together using a variety of connection topologies such as hypercube, ring, butterfly switch and hypertrees to pass messages and data between the computers in a multiple-instruction-multiple-data (MIMD) mode.
Because of the inherent limitations of the present architectures for minimally parallel and massively parallel supercomputers, such computer processing systems are unable to achieve significantly increased processing speeds and problem solving spaces over current systems. The related application identified above sets forth a new cluster architecture for interconnecting parallel processors and associated resources that allows the speed and coordination associated with the current design of minimally parallel multiprocessor systems to be extended to larger numbers of processors, while also resolving some of the synchronization problems which are associated with massively parallel multicomputer systems. This range between minimally parallel and massively parallel systems will be referred to as highly parallel computer processing systems and can include multiprocessor systems having sixteen to 1024 processors. The cluster architecture described in the related application provides for one or more clusters of tightly-coupled, high-speed processors capable of both vector and scalar parallel processing that can symmetrically access shared resources associated with the cluster, as well as shared resources associated with other clusters.
Just as the traditional system architectures were ill-suited for solving the problems associated with highly parallel multiprocessor systems, so too are the traditional packaging architectures. As used within the present invention, the term packaging refers to the physical organization of the various components of a computer processing system. There are four basic functions that packaging performs: power distribution, signal distribution, heat dissipation and component protection. An overview of the various considerations involved in microelectronic packaging and summary of the present state of the art is presented in R. Tummala and E. Rymaszewski, Microelectronics Packaging Handbook, pp. 1-63 and pp. 1087-1121 (specifically discussing packaging for large general-purpose computers and supercomputers) (1989).
Regardless of the system architecture that is chosen for a computer processing system, there are certain physical and operational constraints that have effectively limited the types of packaging architectures used for physically packaging supercomputers. Perhaps the most important of these limitation is the speed at which signals can travel between circuitry elements or components of the system. The limitation that signals cannot travel faster than the speed of light (and usually at some reduced percentage of the speed of light) limits the physical distance that a signal can travel in a finite amount of time. In supercomputers operating at clock speeds on the order of 1 to 10 nanoseconds, this distance is between 1 and 20 feet. In an attempt to place most of the physical components within this physical limit, prior art supercomputer packaging architectures organized the components of the system in unique arrangements. The most notable of these packaging architectures is the Cray hexagonal format in which the circuit elements extend radially outward from a central backplane structure as shown, for example, in U.S. Pat. No. 4,466,255.
One of the other important physical limitations in supercomputer packaging architectures is heat dissipation. In general, the faster the electronic components in a computer system are operated, the more energy they require and the more power they dissipate. In a typical supercomputer, the power dissipated ranges anywhere between 10 and 100 watts/cm.sup.2, depending upon the type of circuitry used (i.e., bipolar, CMOS, GaAs), the physical arrangement of the circuitry and the clock speed at which it is operated. To handle the power dissipated by the very large number of very fast electrical components, prior art packaging architectures employed a number of cooling techniques. In the Cray Y-MP supercomputers, formed convection flow cooling is used over the entire system, as shown, for example, in U.S. Pat. Nos. 4,120,021, 4,466,255, 4,590,538, and 4,628,407. In the now abandoned ETA supercomputers, a portion of the electronic components were immersed in a liquid nitrogen bath.
Another packaging consideration relates to maintenance and component replacement for failed components. Generally, most present supercomputer architectures incorporate traditional packaging schemes which utilize pluggable circuit boards and a backplane format. For example, the Cray packaging scheme uses generally circular backplane arrangement for holding stacks of larger circuit boards, as shown, for example, in U.S. Pat. Nos. 4,700,996 and 4,514,784. Digital Equipment Corp. and IBM Corp. have packaging schemes which utilize smaller circuit board modules in planar modular packaging techniques in a frame structure. Similar types of small circuit boards and planar modular packaging techniques are used in the Hitachi and Fujitsu supercomputers.
While the present packaging architectures for supercomputers have allowed such systems to achieve peak performances in the range of 0.2 to 2.4 GFLOPS (billion floating point operations per second), it would be advantageous to provide a method and apparatus for creating a packaging architecture for a highly parallel multiprocessor system that is capable of providing a distribution of power, cooling and interconnections at all levels of components in a highly parallel multiprocessor system, while increasing the number of circuits per unit time of such a multiprocessor system. More importantly, it would be advantageous to provide for a packaging architecture that is capable of effectively connecting between sixteen and 1024 processors together in a highly parallel cluster architecture to achieve peak performance speeds in the range of 10 to 1,000 GFLOPS.