Concurrent computer architectures are configurations of processors under a common control interconnected to achieve parallel processing of information. Processors arrayed in linear strings, sometimes termed "systolic" architectures, are an increasingly important example of concurrent architectures. Another such architecture is the binary tree, in which the nodes are arranged in levels beginning with a single root and extend to two, four, eight, etc. computing nodes at successive levels.
Pattern recognition is one class of problem to which parallel processing is especially applicable. Pattern recognition is the comparison of an unknown signal pattern to a set of reference patterns to find a best match. Applications include speech recognition, speaker recognition, shape recognition of imaged objects, and identification of sonar or radar sources.
One requirement of multiprocessor architectures important to the solution of pattern recognition and other problems, is scalability of the hardware and the programming environment. Scalability refers to use of the same individual PEs, board-level modules, operating system and programming methodology even as machine sizes grow to tens of thousands of nodes.
Although scalability has been achieved in machines adapted to pattern recognition, its practical realization especially in larger machines has been limited by a lack of tolerance to faults exhibited by the relatively fixed PE lattice structures heretofore used. Schemes in the prior art which supply fault tolerance by adding redundant processing elements and elaborate switching details to disconnect a failed PE and substitute a spare, are expensive and take up space.
If fault tolerance and scalability can be achieved, however, parallel processing offers real-time execution speed even as the problem size increases. For example, a GigaFLOP (one billion floating point operations per second) or more of processing can be required to achieve real-time execution of large-vocabulary speech recognition apparatus. Pattern recognition for future speech recognition algorithms will easily require 100 to 1000 times greater throughput. In general, pattern recognition for higher bandwidth signals, such as imagery, will require a TeraFLOP (one trillion floating point operations per second). Fault-tolerant, scalable, parallel computing machines having hundreds or thousands of PEs, offer a potentially attractive choice of solution.
A property related to scale, is fast execution of communications between a Host computer and the PE array. PE configurations assembled as a binary tree, for example, have the advantageous property that if the number of PEs in the tree array are doubled, the layers through which communications must pass, increase only by one. This property, known as logarithmic communications radius, is desirable for large-scale PE arrays, since it adds the least additional process time for initiating and synchronizing communications between the Host and the PEs. Scalability is served by devising a single, basic PE port configuration as well as a basic module of board- mounted PEs, to realize any arbitrary number of PEs in an array. This feature is critical to controlling the manufacturing cost and to systematically increasing the capacity of small parallel processing machines. Prior art arrangements of high count PE configurations have not met this need, however; and further, have tended to increase the installation size and pin-out count for backplane connections.
TeraFLOP capacities requiring many thousands of PEs in a single system, also currently are prohibitively expensive if realized in the inflexible and permanent hard-wired topologies of the current art. Additionally, fault tolerance in conventional hard-wired PE arrays has been limited heretofore, because the PE interconnection relationships are relatively determined by the wiring. For this same reason, hard-wired PE arrays are not generally reconfigurable.