1. Field of the Invention
This invention relates to multiprocessor data processing systems in general.
2. Description of the Prior Art
The increasingly greater computational throughput requirements in data processing systems for applications such as image processing or scientific computation, have led computer designers to introduce new processor architectures: parallel architectures. Three basic principles are used for introducing this parallelism in the new achitectures. The distinction is made between:
segmented (or pipeline) architectures: this consists in breaking a task down into plural steps and in performing these steps independently by different processors. Every time an intermediary result is obtained after performance of a step, it is transmitted to the next processor and so on. When a step is completed, the processor in charge of performing it is freed and thus becomes available to process new data. Presupposing the respective durations of performance of the different steps to be substantially equal, the period required to obtain the final results is then the duration of performance of one step, and not the duration of performance of the task; PA1 array processor architectures or SIMD (Single Instruction, Multiple Data Stream) architectures. In this type of architecture, the increase in computational throughput is obtained by having the same instruction performed by a large number of identical processing units. This type of architecture is particularly well suited to vectorial processing; and PA1 multiprocessor architectures or MIMD (Multiple Instruction, Multiple Data Stream) architectures. In such an architecture, several processors perform respective streams of instructions independently of one another. Communication between the processors is ensured either by a common memory and/or by a network interconnecting the processors. PA1 a data processing unit connected to other data processing units in immediately adjacent downstream and upstream modules by way of a communication network. Each of the cascaded modules further comprises; PA1 a first memory, PA1 an additional processing unit, PA1 a second memory, PA1 a logic programmable cell array. The programmable logic cell array is configurable into first, second, third and fourth input/output interfaces for temporarily memorizing data into memorized data, and into a central processing and switching circuit for processing the memorized data into processed data and switching the processed data towards one of the input/output interfaces. Each cascaded module further comprises; PA1 a first module bus for interconnecting the data processing unit, the first memory and the first input/output interface, and PA1 a second module bus for interconnecting the additional processing unit, the second memory and the fourth input/output interface. PA1 an first step further consisting in loading a respective set of weights into the second memory of each of the cascaded modules via the communication network, and the input data into the first memory of the first module, and PA1 at least one set of second and third steps, PA1 the second step consisting in carrying out partial processings on the input data in the additional processing unit of each cascaded module as a function of the respective set of matrix multiplication weights in order to determine partial data, and PA1 the third step consisting in downloading the partial data to any one of the programmable logic cell arrays or any one of the first and second memories in the cascaded modules via the intermodular buses and the feedback bus.
Pending European Patent Application No. 433,142 filed Dec. 6, 1990 discloses an architecture of a multiprocessor data processing system in which the bus is shared between plural processor stages and is interfaced in each stage by a programmable LCA Logic Cell Array configurated into plural input/output means and a switching means. The main advantage of such an architecture is to dispense each processor from bus request and management tasks, the latter being carried out in the logic cells array associated with the processor. Nonetheless, this architecture is not optimal for the multiprocessor approach to scientific computation applications. Each processor is in fact entrusted with all the tasks to be performed (excepting management of the bus). Numerous multiprocessor applications require considerable computational means and a single unspecialized processor per stage restricts performances.