One of the processor architectures which are designed to seek high operation capabilities is an array processor. The array processor comprises an array of processor elements (PEs) and a plurality of memory banks disposed in a peripheral region around the array of PEs. The PEs that are arranged in the array execute operations in parallel while at the same time receiving data from the peripheral memory banks, thereby achieving high operation capabilities as a whole. One example of the array processor is referred to as DSA (De-coupled Systolic Architecture) proposed by Volker Strumpen, et al. (see Volker Strumpen and two others “Stream Algorithms and Architecture”, Journal of Instruction-Level Parallelism 6, Sep. 4, 2004. pp. 1-31 (hereinafter referred to as Document 1).
Document 1 discloses an array processor wherein PEs are arranged in an array of R×R (R refers to a natural number) and memory banks are disposed in a peripheral region around the array of PEs. FIG. 1 is a block diagram showing a configurational example of a general array processor. As shown in FIG. 1, the array processor has memory banks 20a, . . . , 20d, 21a, . . . , 21d and array processing section 22.
For the sake of brevity, four memory banks are disposed each above and below array processing section 22 by way of example. Memory banks 20a, . . . , 20d are disposed above array processing section 22, and memory banks 21a, . . . , 21d are disposed below array processing section 22.
For the sake of convenience, memory banks 20a, . . . , 20d, 21a, . . . , 21d are also assigned respective memory bank numbers. The memory bank numbers of eight memory banks 20a, . . . , 20d, 21a, . . . , 21d are #0, #7, respectively.
Array processing section 22 comprises a plurality of PEs 23a, . . . , 23p. Array processing section 22 is illustrated as comprising PEs that are arranged in an array of 4×4, for example.
The array processor operates to process data as follows: First, four memory banks 20a, . . . , 20d which store data to be processed enter necessary data simultaneously into ports of array processing section 22. The entered data are transferred to the PEs that are connected to the ports.
The PEs execute predetermined operations on the received data, and transfer the results of the operations to other PEs. For example, the PEs transfer the results of the operations to PEs that are positioned below those PEs.
The array processor repeats the above process and transmits the data between the PEs for thereby carrying out desired operations. The path along which the data are transmitted between the PEs differs depending on the operations to be carried out. The data of the final results of the operations are output from PEs 23m, . . . , 23p in the final stage to memory banks 21a, . . . , 21d, which store the data.
If the data are to be used again in next operations, then the array processor shifts the data between the memory banks into a data placement suitable for the next operations, and thereafter enters the shifted data into array processing section 22.
The array processor of the type described has higher processing parallelism because the plural PEs execute simultaneous operations, and exhibit higher operation capabilities than the von Neumann architecture which operates at the same frequency. Further-more, since the array processor enters data from the plural memory banks simultaneously in-to array processing section 22, it eliminates the bottleneck which the von Neumann architecture has had about memory access. In addition, the array processor is suitable for operations for stream data as it is capable of continuously processing data.