1. Field of the Invention
The present invention relates to a technique for performing data transfer processing among data processing apparatuses in parallel with data computation in each data processing apparatus, in data processings of a system involving data communication among data processing apparatuses.
2. Brief Description of the Prior Arts
In data processing using an array processor as conventional data processing of a system involving data communication among data processing apparatuses, a common data storage area (memory) is used for data transfer (transmission/reception) processing among the processing elements as well as data computation. Therefore, data computation must be executed after completion of data exchange among the processing elements.
For this reason, data transfer and data computation must be alternately and repetitively performed. In particular, in matrix multiplication processing involving vector data transfer, since the volume of data to be transferred is large, the processing time of the array processor is prolonged due to overhead of data transfer among the processing elements.
For example, a case will be described wherein multiplications of matrices A and B are executed in an array processor shown in FIG. 1.
Processing elements have common data storage areas for reception and transmission of component data of the matrix A and for data computation thereof, and data storage areas for reception of component data of the matrix B, for matrix B component data supply to the computation unit, and for reception of computation results therefrom.
If the (l,m) matrix A and the (m,n) matrix B are respectively defined as: EQU A=(.vertline.a.sub.1.sup.t, .vertline.a.sub.2.sup.t, . . . , .vertline.a.sub.i.sup.t, . . . , .vertline.a.sub.l.sup.t).sup.t ; (1.ltoreq.i.ltoreq.l) EQU B=(.vertline.b.sub.1, .vertline.b.sub.2, . . . , .vertline.b.sub.j, . . . , .vertline.b.sub.n); (1.ltoreq.j.ltoreq.n)
A jth-column vector .vertline.c.sub.j in a matrix C=A.times.B is represented by: ##EQU1## where .vertline.a.sub.i and .vertline.b.sub.j are respectively a row vector and a column vector, and are respectively represented by: ##EQU2## .vertline.a.sub.i.sup.t represents a transposed vector of a.sub.i and symbol "." represents inner product computation, and for example: ##EQU3##
Therefore, a series of vector data {.vertline.a.sub.i } representing the matrix A and a series of vector data {.vertline.b.sub.j } representing the matrix B are input to the array processor in accordance with a data flow as shown in FIG. 1, so that the components of the matrix C can be computed by pipelined processing in each processing element.
FIGS. 2A, 2B, and 2C show of array processing when l=5 and n=5.
In this case, since the data storage areas are commonly used for data exchange and data computation, vector data transfer among the processing elements and inner product computation between vector data are serially performed.
For example, at time 3, a processing element PE1 receives data .vertline.a.sub.2, and at the same time, transfers retained data .vertline.a.sub.1 to a processing element PE2.
The processing element PE2 receives the data .vertline.a.sub.1 from the processing element PE1 at that time, and externally receives vector data .vertline.b.sub.2.
At time 4, the processing elements execute inner product computation of the data received at time 3. In the above processing, a time required for data transfer among the processing elements is prolonged, and high-speed processing cannot be achieved.