Computer systems are becoming larger and faster every day. Where just a few years ago machines which would execute a million instructions per second (MIPS) were on the leading edge of technology, even desktop machines today are measured in multiples of this execution benchmark. Supercomputers are hundreds of times faster than this and have complex architectures and specialized hardware to attain this speed.
In conventional machines of the Von Neumann architecture where instructions are executed in a sequence, faster circuitry and clock speeds, and specialized memory architectures are used to speed program throughput. In these machines, cache memories are used in favor of slower memories, and increased parallel handling of longer instructions on faster buses is performed.
Another approach is parallel processing including pipeline processing where the execution of an instruction is broken into several tasks. While one task is performed on one instruction, other tasks in the pipeline can be performed on the instructions which precede it and which follow it, thereby essentially executing several instructions at once. The concept of pipelining can be taken a step further by separating the execution of a program into parcels which can be executed by separate hardware in one machine. This type of parallel processing requires coordination between both the hardware and the operating system software to assign tasks and to communicate the results between the parallel parts.
An even more sophisticated parallel processing engine can be developed where the operating system not only parses across different portions of the hardware in one machine but also parses across several different processors (multiprocessing), which can in and of themselves be of a highly complex architecture. However, when an operating system parses a program for execution across multiple machines, the architecture and method of connecting the machines in the communications network is extremely important to the success of the overall system. The instruction execution speed gained in parallel execution can be lost in inefficient parsing and communication between machines. Even the most efficient communication and connection schemes increase overhead execution time with increasing complexity.
Thus, supercomputers built with multiprocessing engines can become extremely expensive very quickly. However, the most straightforward and least expensive way to build a supercomputer capable of hundreds or thousands of MIPS is to interconnect a large number of very fast microprocessors. These single chip computers are becoming extremely fast and flexible for their size and cost, and generally contain their own memory, communications ports, timing and bus structures. Many are even equipped with executive instructions and modes of operation which are specifically designed for multiprocessing and networking. Such individual components which incorporate this degree of power and communication ability are ideally suited for interconnection into a supercomputer, but the problem remains as to which network connection scheme will be most efficient for maintaining overall control of program execution and communications.
The easiest network connection scheme for connecting N processors is to connect every processor to every other processor by a communication link. This technique has its drawbacks because, for every processor that is added, an extra communications port and link must be added to every other processor previously in the system. In addition, system overhead for maintaining and controlling communication links between all machines becomes exorbitant. It has been found that architectures which connect less connections than all processors to every other processor are still advantageous in parallel processing applications. What has evolved in the parallel processing context can be characterized as a series of problems or general program algorithms which must be solved efficiently by the parallel processing engine. If an architecture can solve the majority of these problems, then it has the necessary power to be a generalized parallel processing architecture.
An architecture which has had some success with such problems is one termed the hypercube. An n-dimensional hypercube computer, also known as a binary n-cube computer, is a multiprocessor engine characterized by the presence of N=2.sup.n processors interconnected as an n-dimensional binary cube. Each processor forms a node, or vertex, of the cube and has its own CPU and local memory. Any processor has direct communication paths to n other processors. These connected processors are its neighbors, and the communication links correspond to the edges of the cubes. There are 2.sup.n distinct n-bit binary addresses or labels that may be assigned to the processors. Thus, each processor address differs from that of each of its n neighbors by exactly one bit position.
It is known that the hypercube structure has a number of features that make it useful for parallel computation. For example, meshes of all dimensions and trees can be embedded in a hypercube so that neighboring nodes are mapped to neighbors in the hypercube. The communication structure used for the fast Fourier transform (FFT) and sorting algorithm can also be embedded in the hypercube. Since a great many scientific applications use the mesh, tree, FFT, or sorting interconnection structures, the hypercube is a good general-purpose parallel architecture. Even for problems with less regular communication patterns, the internode distance (graph diameter) of the hypercube of n=log.sub.2 N means that any two nodes can communicate fairly rapidly. This diameter is larger than the unit diameter of a complete graph K.sub.n, but is achieved with nodes having only a degree n, or a fanout of log.sub.2 N, as opposed to the N-1 degree of nodes in K.sub.n.
Other standard architectures with a small degree, such as meshes, trees, or bus systems, have either a large diameter (.sqroot.N for a two-dimensional mesh) or a resource that becomes a bottleneck in many applications because too much of the total communications must pass through it (such as occurs at the apex of a tree or a shared bus network). Thus, from a general topological viewpoint, the hypercube architecture balances node connectivity, communication diameter, algorithm embedability, and programming ease. This balance makes the hypercube suitable for a broad class of computational problems. See Hayes, et al., "A Microprocessor-based Hypercube Supercomputer", IEEE Micro, pp. 6-17, Oct., 1986.
A major drawback of the hypercube, however, is that it has a degree log.sub.2 N, where N is the total number of nodes, which grows with the size of the network. This causes the design of the node processors to change with the size of the network because the number of ports needed to connect to other processors varies. Also, the complexity of each processor increases with the size of the network because of the non-constant degree of the network. Presently, the log.sub.2 N degree of the hypercube has not proven to be an impediment because of the relatively low number of processors, e.g., N&lt;2.sup.10, used in current machines. However, in the future, when a concurrent processor (supercomputer) with a massive number of processors is needed, the degree of log.sub.2 N may prove to be a critical and limiting factor. Therefore, there is a need for a high performance concurrent computer with easy node connectivity, similar to the hypercube, but with a constant degree. It is evident that it would be highly advantageous to be able to configure such a constant degree network supercomputer in substantially all the standard communications topologies used in solving multiprocessor problems.