1. Field of the Invention
The present invention is directed to a SIMD (Single Instruction-stream, Multiple Data-stream) microprocessor having a capability of parallel data processing for multiple data pieces with a single operating instruction.
2. Description of the Related Art
SIMD microprocessors are capable of simultaneously performing the same calculation operation on multiple data pieces with a single instruction. SIMD microprocessors have been therefore frequently used for processes involving a repetition of the same set of calculation steps over extremely large amounts of data (for example, image processing of digital copiers).
In image processing of a SIMD microprocessor, a high-speed calculation operation is achieved by aligning multiple operations units (processor elements (PE)) in the image-data main scanning direction and executing the same calculation steps with multiple data pieces at the same time.
Pre-operation pixel data pieces to be input to a calculating unit of each processor  element and post-operation pixel data pieces are stored in multiple register files provided in the processor element.
For example, a SIMD processor disclosed in Patent Document 1 is provided with an external data processing device capable of accessing the register files. The data processing device inputs and outputs image data pieces between the register files and an external image memory in the background of the calculation operations of the calculating units in the processor elements, thereby improving the performance of the image processing apparatus. [Patent Document 1] Japanese Patent No. 3971535
To obtain further improvement in the performance of the SIMD processor described in Patent Document 1, the following schemes are conceivable:
(a) to increase the operating frequency;
(b) to increase the number of processor elements; and
(c) to increase the number of external data processing devices capable of accessing the register files.
Among the three schemes, if (b) and (c) were implemented at the same time, the following  problems would occur. That is, in the case where the SIMD processor allows external data processing devices to access register files belonging to arbitrary processor elements, as in the case of Patent Document 1, a significantly large number of wiring lines are necessary in order to connect the external data processing devices and the register files. Furthermore, outlets need to be provided to connect wiring lines, extending from one end of one-dimensionally arranged processor elements to the other end, to the external data processing devices. In this case, if all the outlets are disposed near the central part of the PE array (a group of the one-dimensionally arranged processor elements) so that each wiring line equally extends from the wiring outlet to a processor element on each end, the wiring lines are concentrated between the vicinity of the central part of the PE array and the data processing devices.
The above-described problems are explained with reference to an example of FIG. 6. FIG. 6 illustrates sixteen processor elements (PE0 through PE15) arranged in one dimension and eight data processing devices (0 through 7) arranged one dimensionally in the same direction as the alignment  of the processor elements. Each processor element has a register file which includes eight access registers (R0 through R7). Wiring lines 101 each extending from the top processor element (PE0) of the one-dimensionally arranged processor elements to the bottom processor element (PE15) need to be connected to the data processing devices 0 through 7. In this case, outlets 102 used to draw out the wiring lines 101 toward the data processing devices 0 through 7 are all provided near the central part of the PE array so that all wiring lines 101 equally extend from the outlet 102 to the top and bottom processor elements (PE0 and PE15). Accordingly, the wiring lines are concentrated between the vicinity of the central part of the PE array and the data processing devices.
This arrangement poses serious implementation problems, such as causing great variation in the length of the wiring lines between the external data processing devices and the outlets 102. Moreover, the arrangement is a possible cause of decreasing the communication speed between the external data processing devices and the PE register files.
The present invention aims at solving  these problems. That is, in view of the above problems, the present invention aims at providing a high-performance image processing apparatus by solving the issues of excessive wiring lines associated with an increase in the number of processor elements and the number of external data processing devices and communication speed slowdowns due to the wiring lines of increased length.