In the never ending quest for faster computers, engineers are linking hundreds, and even thousands of low cost microprocessors together in parallel to create super supercomputers that divide in order to conquer complex problems that stump today's machines. Such machines are called massively parallel. We have created a new way to create massively parallel systems. The many improvements which we have made should be considered against the background of many works of others. A summary of the field has been made in other applications which are referenced. See in this connection the related application for our Parallel Associative Processor System, U.S. Ser. No. 601,594 and our Advanced Parallel Array Processor (APAP). System tradeoffs are required to pick the architecture which best suits a particular application but no single solution has been satisfactory. Our ideas make it easier to provide a solution.
The interrelationship of the processors in an array of processors and the methods used to communicate among the processors has been the focus of considerable study as documented in the literature related to such arrays. Studies have focused on minimizing the number of steps to move a message between any two elements of the array, Studies have focused on nearby communication to support image processing and other such very regular problems. In short, A parallel processor array of the SIMD or MIMD type requires a highly organized and efficient connection network for communication among processing elements (PEs).
A communication network can be required to communicate synchronously where all pickets transfer data in the same direction at the same time, or it can be required to communicate randomly as where each picket sends out a message at random times to random places. This later approach we call a routed transfer.
Synchronous transfers and router transfers may need to be addressed on either a MIMD or a SIMD array control architecture while attempting to keep the communication complexity simple.
Several communication topologies are described in the literature and implemented in various ways is array machines. A basic communication topology is the simple left-right connectivity of a linear array. In a linear array each of the two port PEs communicate with the PE on either the left of the right via a point to point network. In a more extensive conventional mesh topology of two or more dimensions, the communication network is implemented by using direct links between a source element and the elements in each of the implemented dimensions. Thus, each element has two links for each dimension of the array as where in a conventional two dimensional array with NEWS (north, east, west, south) network, each element will have four links to other elements, and if another dimension is added, two more links must be added to each element of the mesh. Within each element of a conventional mesh there may be a router function that receives and transmits messages or data packets over the appropriate links. The Hypercube in several of it's multibased multidimensional embodiments represents something near the ultimate in processor array communication networks. With the binary hypercube, for example, the number of ports quickly grows to significant proportions.
The processing element (PE) in a conventional array requires enough ports to reach the necessary elements with point to point network links. Some PEs need four ports, some 6, some 8, and some 30 (15 dimensional binary hypercube with 32k elements), depending on topology and extent of the implemented network. Also, each link can contain from one up to possibly 50 parallel lines to accommodate the ever increasing data transfer rate demands.
Now, as we are putting these topologies into hardware, the packaging of these arrays into chips, cards, drawers, racks, and rooms causes us to quickly focus on the number of links and the number of signal pins in each link. As wafer technology allows more circuits per chip, arrays of parallel processors are becoming more affordable, and more dense arrays are desired.
This application focuses on using DOTable networks to interconnect two port PEs to realize many of the topologies of today while significantly reducing packaging pincounts. The packaging of an array of pickets with any mesh configuration poses several packaging problems, most of which relate to limited available package pins, or the desire to minimize the number of pins required.
In the patent art there are some patents which generally talk about SIMD and other networks. Among them are U.S. Pat. No. 4,270,170 of Reddaway, entitled "Array Processor", which discusses a NEWS network connected SIMD array wherein a dotted three branch network is used to interconnect chips containing 4 PE each. Simultaneous transfer in one of the directions is accommodated. By using the dotted networks, the ports on one chip are reduced from 8 to 6 thereby achieving a 25% reduction in pin count. Only a 2D network is mentioned. A global routing scheme is directed by two lines which reach all elements in the array and encode the 4 directions common to a NEWS network. While this patent is representative of a prior dotted network to reduce pin and port count through out the network, it can not address four branches on each network but only three, and it provides more than three ports per processing element (3.5 on the average), leading to higher pin and port counts than we have found could be achieved. This patent only addresses a 2D NEWS network. We show it is desirable to accommodate extension into other dimensions and configurations and these problems are not addressed by this patent. Apparently, the array of this patent requires simultaneous transfer of data in a specified direction.
U.S. Pat. No. 4,468,727 of Carrison, entitled "Integrated Cellular Array Parallel Processor", discusses an array processor that is integrated with an array of radiation sensors such that image processing is performed on the same monolithic substrate with the sensors. Interconnection between the processing elements is accomplished with charge coupled gates on the NEWS edges of each PE, and as such represents just one of many NEWS arrays. It has no dotted communication network.
U.S. Pat. No. 4,805,091 of Thiel, entitled "Method and Apparatus for Interconnecting Processors in a Hyper-Dimensional Array", on the other hand, is a good example of the hypercube interconnection network employed by the machines made by Thinking Machines, Inc., e.g. a "Connection Machine", and it discusses the application of the binary hypercube to the packaging of PEs with chips, cards, boards, and frames where each level of package is accomplished with a higher (or lower) dimension of the hypercube. While this U.S. Pat. No. 4,805,091 patent does not mention any form of DOTing mechanism, it describes the binary hypercube. This is another example of the applicability of our invention of a dotted communication network for array processors, as our invention can be applied to implement a binary hypercube. The patent however, makes no mention of dotted buses as we will illustrate.
U.S. Pat. No. 4,985,832 of Grondalski, entitled "SIMD Array Processing System with Routing Networks Having a Plurality of Switching Stages to Transfer Messages Among Processors", is another example of SIMD Array Processing Systems having routing networks. They address small groups of PEs that communicate via memory sharing; a NEWS mesh providing for regular array processing; a mechanism whereby PEs can share a large broadcast communication task; and a random routing network comprised of some "butterfly" stages followed by a 16.times.16 crossbar switch; but this patent focuses on random routing including a crossbar switch chip and its fault tolerant aspects. While this patent focuses on a number of communication schemes, it does not describe any dotted mechanism.
U.S. Pat. No. 4,910,665 of Mattheyses, entitled "Distributed Processing System Including Reconfigurable Elements", discusses a two dimensional SIMD array processor interconnection scheme whereby each PE has direct access to 8 of its neighbors. The communication media is a dotted network that interconnects four neighbors at the corners. Each PE enjoys four such dotted networks each. The suggestion of the X-DOT of this U.S. Pat. No. 4,910,665 patent and the dotted connection that we call an H-DOT both permit four PEs to be joined together by a dotted network. However, U.S. Pat. No. 4,910,665 discusses only the mesh topology and extensions into toroids, and focuses on the circuit within a PE. We believe there needs to be improvements in the focus on the connectivity, and routing.