1. Field
This patent specification relates to a parallel processor and an image processing system using such processor in digital duplication machines, facsimile apparatuses, and other similar image processing systems.
2. Discussion of the Background
Recent developments in digital duplication machines, facsimile apparatuses, and other imaging systems, have focused attention on improvement of image quality with increasing the number of picture elements and adopting color. Such recent trends have brought about an increase in the image data to be processed.
The processing of image data is carried out, in general, by repeating a similar set of processing steps over image data of picture elements. A single instruction multiple data stream (SIMD) type processor, therefore, has been found suitable due to its parallel data processing capabilities such that processing of a plurality of image data can be carried out by a single instruction.
FIG. 12 is a block diagram of a known SIMD processor. As shown in FIG. 12, the SIMD processor 1 includes at least a global processor (GP) 2, a processor element block 3 provided with a plurality of processor elements (PE) 3a, and an interface 4.
In order to process a plurality of data items simultaneously using a SIMD processor, the processor element block 3 has a plurality of processor elements (PE) 3a, as described above. Each of the processor elements 3a includes a register file 31 and an operation unit 36. The register file 31 includes a plurality of registers, and the operation unit 36 is configured to process data provided from the register file 31 and global processor 2.
Based on instructions by the global processor 2, the interface 4 is configured to provide data to be processed to an input/output (I/O) register file 31 in the processor, in which the data are input by, for example, an external scanner. The interface 4 is also configured to transfer processed data from the I/O register file 31 to an external unit such as, for example, a printer.
The global processor 2 operates to control both the processor element block 3 and interface 4. In addition, the global processor 2 includes at least a single instruction single data stream (SISD) type processor which operates to output various control signals.
As described above, the control with a SIMD processor is configured to execute a single instruction in every processor element 3a. Namely, in the SIMD processor, one control signal line is connected from the global processor 2 in common to each of the plurality of the processor elements 3a, and by sending an instruction by way of the thus formed control signal line, each of the processor elements 3a executes an identical operation processing based on this single instruction.
FIG. 13 is a schematic diagram illustrating the circuit interconnection including the processor elements and control signal line.
Referring to FIG. 13, in order to execute a single operation processing onto a plurality of data, a plurality of processor elements, PE0 through Pen are provided. In addition, each processor element 3a is provided with n registers, 31-1 through 31-n (e.i., REG1 through REGn).
Control signals are sent from an instruction signal generator in the global processor 2 to each processor element 3a by way of the control signal line (CS).
Namely, two clock signals, CP and CN, are sent to the registers 31-1xcx9c31-n, from the global processor 2. These two signals CP and CN each have a sign opposing each other, in that the one signal is generated by inverting the sign of the other by means of an inverter. These clock signals CP and CN are then sent to each processor element 3a. Incidentally, FIG. 13 shows the case where CN signals are provided by way of even-numbered control lines, while CP signals are by way of odd-numbered lines.
Further, according to CP, CN clock input signals, registers 31-1 through 31-n operate to latch data input into the D input of the register from the internal bus signal, and output data from P, Q terminals to an internal bus.
With the increase in the number of the processor element 3a in those known processors, however, the length of the control signal line increases from a driving circuit in the instruction signal generator to distant processor elements 3a toward the end terminal. As a result, a wiring delay of CS signals is caused among the processor element PE0 nearest to the driving circuit and those distant from the circuit such as, for example, the terminal processor PEn.
This gives rise to several drawbacks such as a reduction in circuit characteristics such as evidenced by circuit malfunction in extreme cases. In addition, this also necessitates relatively large power for driving the circuit.
An array processor is disclosed in Japanese Laid-Open Patent Application No. 8-212169, in which neighboring n processor elements constitute a group, and each of groups in the array processor is individually provided in common with one register and one control signal line.
Further, in that description, neighboring registers in the array processor are interconnected by a read bus and write bus, to thereby be able to operate a plurality of processor elements as a single processor. This may result in disadvantages in the array processor such as difficulties in shifting from one processor to another and in updating data in a specific processor, among others.
In addition, no description could be found in that document regarding the circuit wiring delay of CS signals which is caused among the processor element PE0 in the vicinity of an instruction sequence control unit and those distant from the unit such as, for example, the terminal processor PEn.
Accordingly, it is an object of the present disclosure to provide an improved parallel processor with neither the above described circuit delay or undesirable increase in power for circuit driving.
The following brief description is a synopsis of only selected features and attributes of the present disclosure. A more complete description thereof is found below in the section entitled xe2x80x9cDescription of Preferred Embodimentsxe2x80x9d
A parallel processor disclosed herein includes a global processor configured to decode programs and assume overall control of the parallel processor; and a processor element block comprising a plurality of processor elements configured to process various data.
Each processor element contains a plurality of functional means including at least an operation unit and a register file provided with a plurality of registers, each functional means is connected to an internal bus, the operation of the functional means is controlled by a logic of global control signals generated by the global processor, the processor elements are divided into groups each including an arbitrary number thereof, buffer means t buffer the control signals in each of the groups, the global control signals are input into the buffer means, local control signals are generated by the buffer means and sent to each of the groups to be subsequently terminated within each of the groups, and the global control signals are provided to all of the buffer means in the groups.
In addition, the transfer and exchange in the parallel processor of data among the functional means are carried out by way of the internal bus, the buffer means is provided in the middle of the group of an arbitrary number of the processor elements.
Further, in the parallel processor, the circuit interconnection for the global control signals is provided in the uppermost metal layer in the IC layout process and shielded by power lines.
Still further, the global control signals are input into the buffer means included in the group of an arbitrary number of the processor elements, and the operation of the functional means in the group of an arbitrary number of the processor elements is controlled by the local control signals generated by the buffer means.
According to another aspect, an image processing system disclosed herein is configured to input image data, operates to parallel process the image data, and outputs processed data to an external unit.
The image processing system includes the parallel processor containing a plurality of processor elements provided to form an array for the image data to be input, and a first-in first-out (FIFO) memory to perform at least one of the inputting operation of the image data and an outputting operation of the processed data to the external unit, by way thereof.
The processor elements each includes a plurality of functional means including at least an operation unit and a register file provided with a plurality of registers, each functional means is connected to an internal bus, the transfer and exchange of data among the functional means are carried out by way of the internal bus, the operation of the functional means is controlled by a logic of global control signals generated by the global processor, the processor elements are divided into groups each including an arbitrary number thereof, a buffer means to buffer the control signals is provided in each of the groups, the global control signals are input into the buffer means, local control signals are generated by the buffer means and sent to each of the groups to be subsequently terminated within each of the groups, the global control signals are provided to all buffer means in the groups, and the processed data are output to the external unit in response to the global control signals.
In addition, in the image processing system, the buffer means is provided in the middle of the group of an arbitrary number of the processor elements, the circuit interconnection for the global control signals is provided in the uppermost metal layer in the IC layout process and shielded by power lines.
Further, the global control signals are input into the buffer means included in the group of an arbitrary number of the processor elements, and the operation of the functional means in the group of an arbitrary number of the processor elements is controlled by the local control signals generated by the buffer means.
According to still another aspect, a method disclosed herein for inputting image data, operating to process the image data, and outputting processed data to an external unit, for the image processing system, includes the steps of inputting the image data into a parallel processor containing a plurality of processor elements provided forming an array, performing at least one of the inputting operation of the image data, and outputting operation of the processed data to the external unit, by way a FIFO memory.
The image processing system incorporating the parallel processor, which is utilized in the method, has the construction and capabilities described herein above.