Parallel computing involves the concurrent execution in separate processing elements of several computations under a central control. For certain classes of problems, parallel processing not only increases computation speed markedly, but also provides a means of processing problems for which only real-time solutions are useful.
The communications protocols in parallel computing must facilitate the transmission of data and computation instructions from a Host to each of a large numbers of processor elements or cells, and the reporting back to the Host by each cell of the computation results. Typically, the Host allocates different computational tasks to different individual cells. Therefore, the Host requires a way to associate a given cell report with a particular assigned computational task.
In many parallel processing architectures of the prior art, however, such as, for example, binary tree architectures, the communications protocols which enable the Host to know the identity of each cell need involve many other cells. Thus, if a given cell in the tree architecture were to fail, communications to and through the failed cell would be interrupted, and the functionality of the overall architecture could be impaired.
Pattern recognition is one type of computational problem for which parallel processing has important applications. Pattern recognition is a collection of techniques for sensing data representative of an unknown pattern, and determining which pattern within a large reference set of different known patterns constitutes the closest match to the unknown pattern.
Pattern recognition techniques apply, for example, to speech and speaker recognition; robotic vision; optical character recognition; the classification of acoustic, optical or other electromagnetic emissions from submarines, ships, land vehicles, aircraft, and spacecraft in order to uniquely identify the emitter and platform; and the identification of specific data or images transmitted over a particular communications channel or channels.
In parallel pattern recognition processes, a known set of signal patterns are digitally expressed and then individually stored at respective ones of the processor elements as a library of reference patterns. An incoming signal to be identified is converted to a digital signal pattern, which is then transmitted to the processor elements in predetermined fashion for comparison to each of the known patterns of the reference library. The value of each comparison constitutes a measure of the similarity, or correlation, of the unknown pattern to each of the stored, known patterns.
In complex problems, the pattern set is constituted as a collection of primitive pattern elements, with rules for "gluing" the elements together into structural patterns. This technique is used in speech recognition, where the reference library consists of sentences or sentence segments defined individually as grammatically correct sequences of words. A sequence of unknown incoming speech patterns are matched against the library reference patterns of permissible sentence segments. The measures of similarity are calculated at each processor node in the architecture. Then, under the control of the Host, the closest match is identified and displayed for a viewer or used in further processing.
In the field of speech pattern recognition, the number of grammatically permissible speech segments is so large, that the necessary computations cannot be executed in real time even using high-performance computers of conventional design such as a Cray. Parallel processing is a possible solution; but the number of processor elements required to achieve the needed pattern-matching must be correspondingly large in order to provide the necessary system processing rate on the order of 100 GigaFLOPS. In other fields, where the input signal has yet higher bandwidth such as optical or electromagnetic, TeraFLOP computation can be required.
The advent of relatively low-cost digital processor chips makes it economically feasible to construct a parallel processor which can achieve the necessary pattern comparisons in real time. The problem, however, as with any machine relying on large numbers of critical performance elements, is that failure of one element can render the machine incapable of performing to its designed level. In a typical hard-wired binary tree architecture parallel processing arrangement, for example, if one processing element fails, its descendent subtree structure is disconnected from the rest of the tree. The result will be decreased reliability and increased operating cost.
Current techniques for achieving tolerance to chip or node failures in large multiprocessor machines, add unduly to the machine's physical volume, heat generation and expense. The lack of an effective solution to fault occurrences becomes more critical as the number of processing elements grows; and this in turn limits the system designer's ability to meet the requirements of computationally intensive pattern recognition applications.
A second problem for multiprocessor computer architectures, however, and one that is not directly related to fault tolerance, is that the sheer volume of the aggregate processor configuration in computers with TeraFLOP power, makes difficult the distribution of control signals to the individual processor elements. This deficiency, in fact, is endemic to many high bandwidth systems that are both fully electronic and physically large.