1. Field of the Invention
The present invention relates to an array processor which processes data in parallel with each other by means of a plurality of data processing elements disposed in matrix formation.
2. Description of the Prior Art
FIG. 1 is the simplified block diagram of a conventional array processor, in which nine data processing elements (PE) 11a through 11i are disposed in matrix formation composed of 3 lines and 3 rows, while adjoining data processing elements (PE) 11a through 11i are connected to each other by means of input/output lines 12a through 12l.
FIG. 2 is the schematic block diagram of one of the identical data processing elements (PE) 11a through 11i constituting the conventional array processor cited above. Note that FIGS. 1 and 2 respectively show the simplified block diagrams of those which are presented in the specification of "GEOMETRIC ARITHMETIC PARALLEL PROCESSOR", NCR 45CG72, a product of National Cash Register, Inc., U.S.A.
Refer now to FIG. 2. In the arithmetic parallel processor numeral/designates an arithmetic and logical unit (ALU). The ALU 1 receives data from the first and second registers 2a and 2b and executes arithmetic logical operation using operands composed of these data. The ALU 1 directly feeds the result back to the first register 2a or delivers it to local memory unit 3 for storage.
In addition to those data delivered from local memory unit 3, registers 2a and 2b respectively receive data from an input/output line 6 through an interface circuit 5.
The local memory unit 3 stores data delivered from the interface circuit 5 or from the registers 2a and 2b or from the ALU 1. The local memory unit 3 then outputs data to any of these. The local memory unit 3 receives and outputs data in accordance with incoming address input 4c delivered from external sources.
The interface circuit 5 is connected to the registers 2a, 2b and the local memory unit 3, while it is also connected to external sources through the input/output line 6.
Next, functional operation of the conventional array processor cited above is described below.
On receipt of data from the registers 2a and 2b, the ALU 1 first executes arithmetic logical operation using operands composed of these data. The ALU 1 then outputs the result to the first register 2a and local memory unit 3. The local memory unit 3 then delivers data to the registers 2a and 2b or to the interface circuit 5. The interface circuit 5 then transmits received data to external sources through the input/output line 6.
But, the ALU 1 cannot simultaneously execute those operations mentioned above using the same system.
As shown in FIG. 1, the array processor is composed of a plurality of data processing elements being disposed in a array form shown in FIG. 2. Consequently, since each data processing element PE of the array processor simultaneously executes those operations mentioned above in parallel with each other, the entire system can process data at a very fast speed.
As mentioned above, the conventional array processor cited above can execute data processing operations at a very fast speed, however, due to the conventional constitution of individual data processing elements, each data processing element is merely provided with a single local memory unit. As a result, in order to read a plurality of data needed for execution of the operation of ALU 1 from local memory unit 3 for delivery to registers 2a and 2b feeding data to ALU 1, it is essential for the array processor to access local memory unit 3 a specific number of times corresponding to the number of data. Thus, by executing memory accessing operations many times, operating efficiency of the ALU 1 is eventually lowered. This in turn prevents the array processor from accelerating the data processing operation.
When executing addition of two numbers, the conventional array processor must sequentially execute the following operations. First, the first register 2a reads the first number from local memory unit 3 and then stores it. Next, the second register 2b reads the second number from local memory unit 3 and then stores it. Finally, the ALU 1 reads both numbers from registers 2a and 2b before eventually executing addition of both numbers.