1. Field of the Invention
The present invention relates to a parallel processor apparatus, more particularly relates to a parallel processor apparatus capable of reducing the power consumption and improving the operating speed when converting serial data to parallel data or when converting parallel data to serial data.
2. Description of the Related Art
Where digitally processing image data, there are many cases in which similar processing is applied to all of the pixel data composing an image.
In order to execute similar processing with respect to a plurality of data at a high speed, a parallel processor apparatus adopting a single instruction multiple data stream (SIMD) type architecture has been proposed. This is being utilized in a wide range of fields not limited to image data processing.
In an SIMD type architecture, exactly the required number of processor elements are arranged. Each processor element operates according to the same command.
Accordingly, when different data are simultaneously given to the processor elements, the results of processing with respect to the individual data are simultaneously obtained. As a parallel processor apparatus adopting such an SIND type architecture applied to image data processing, there is for example known the parallel processor apparatus shown in SVP (SERIAL VIDEO PROCESSOR, Proceedings of the IEEE 1990 CUSTOM INTEGRATED CIRCUITS CONFERENCE, p. 17, 3.1 to 4).
This parallel processor apparatus, as shown in for example FIG. 11, is provided with a data input register 102, processor elements (PE) 3.sub.1 to 3.sub.n and a data output register 104.
Here, the data input register 102 sequentially receives as input one sweep line's worth of pixel data as serial data S.sub.IN and outputs the one sweep line's worth of pixel data to the individual processor elements 3.sub.1 to 3.sub.n. The processor elements 3.sub.1 to 3.sub.n respectively process the one sweep line's worth of pixel data in parallel. The data output register 104 receives as input the processed one sweep line's worth of pixel data in parallel from the processor elements 3.sub.1 to 3.sub.n and sequentially outputs the same as the serial data S.sub.OUT.
The routine for processing image data composed by pixel data of m.times.n number of pixels p(1,1) to p(m,n) arranged in the form of a matrix as shown in FIG. 12 in such a parallel processor apparatus 101 will be explained below by referring to FIGS. 14A to 14C.
Here, the pixel data of the pixel p(i,j) of any i,j (where 1.ltoreq.i.ltoreq.m, 1.ltoreq.j.ltoreq.n) can be expressed by using a plurality of bits.
In FIG. 12, pixels are usually swept in order from the left to the right and from the top to Ebottom, therefore the image data are generally transmitted in the format as shown in FIG. 13. Here, the period for sweeping one line's worth of pixel data will be referred to as a "horizontal sweep duration". Further, the period for the sweep to return from a right end of a certain line of the screen to a left end of a next line will be referred to as a "horizontal blanking duration". For example, there is a horizontal blanking duration between the pixel p(i,n) of the right end of an i-th line and the pixel p(i+1,1) of the left end of the next line.
In FIGS. 14A to 14C, for example, image data comprised of pixel data composed in turn of a plurality of bits are sequentially input to the input terminals of the processor elements in units of the pixel data. The pixel data of the first line are stored in a data input register 102 shown in FIG. 11 having a storage capacity of one line's worth of the image data in a first horizontal sweep duration (S1). Then, the pixel data of the first line stored in the data input register 102 are sequentially output to the processor elements 3.sub.1 to 3.sub.n within the next horizontal blanking duration (S2) so that one pixel's worth of the pixel data is supplied to one processor element.
Next, in the horizontal sweep duration (S3), a each of the processor elements 3.sub.1 to 3.sub.n performs processing with respect to the supplied one line's worth of the pixel data. Further, simultaneously with this, pixel data of the second line are sequentially input to the data input register 102. Then, within the succeeding horizontal blanking duration (S4), the processed pixel data of the first line are supplied from the processor elements 3.sub.1 to 3.sub.n to the data output register 104 in parallel. Simultaneously with this, the pixel data of the second line are supplied from the data input register 102 to the processor elements 3.sub.1 to 3.sub.n in parallel. Then, in the next horizontal sweep duration (S5), the pixel data of the first line stored in the data output register 104 are sequentially output to the output terminals of the processor elements. Simultaneously with this, the processor elements 3.sub.1 to 3.sub.n process the pixel data of the second line, and the pixel data of a third line are sequentially input to the data input register 102.
After this, when the processor elements 3.sub.1 to 3.sub.n process the pixel data of the i-th line, the operation of the data input register 102 receiving as input the pixel data of the (i+1)th line and the data output register 104 outputting the pixel data of the (i-1)th line is repeated. In this way, the data input register 102, the processor elements 3.sub.1 to 3.sub.n, and the data output register 104 operate in synchronization, whereby image data processed for every horizontal sweep duration is output.
Below, a detailed explanation will be given of the data input register 102.
As shown in FIG. 15, the data input register 102 is constituted by a pointer circuit 105 and a conversion circuit 6.
The pointer circuit 105 can be constituted by using a shift register widely used when performing mutual conversion of serial data and parallel data. The pointer circuit 105 is structured with the unit delay elements 26.sub.1 to 26.sub.n such as D-type flip-flops connected in series, receives as its input a pointer control signal S1 comprised by a clock signal S11 and pointer data S12, and outputs pointer data S26.sub.1 to S26.sub.n to the conversion circuit 6.
The conversion circuit 6 has arranged in parallel first switching means 30.sub.1 to 30.sub.n, memories 31.sub.1 to 31.sub.n and second switching means 32.sub.1 to 32.sub.n, receives as its inputs serial data S.sub.IN, pointer data S26.sub.1 to S26.sub.n, and a switch control signal S9, and outputs parallel data S6 comprised by data S6.sub.1 to S6.sub.n to the processor elements 3.sub.1 to 3.sub.n.
Here, the signal line of the serial data S.sub.IN and the signal lines of the parallel data S6 have widths or numbers of bits sufficient for expressing the data of one pixel.
The operation of the data input register 102 will be explained next by referring to FIGS. 16A to 16C.
In the conversion circuit 6, among the switching means 30.sub.1 to 30.sub.n, the switching means 30.sub.x with a logic "1" of the pointer data S26.sub.1 to S26.sub.n become an ON state and store the corresponding pixel data among the sequentially input serial data S.sub.IN in the memory 31.sub.x. Namely, in synchronization with the logical value "1" being given to the pointer data S12 for only a first clock cycle in the horizontal sweep duration and a pulse being given to the clock signal S11. If the pixel data of the pixels p(i,1) to p(i,n) of for example the i-th line are sequentially given as the serial data S.sub.IN, one line's worth of pixel data is respectively stored in the memories 31.sub.1 to 31.sub.n.
First, the pointer data S26.sub.1 indicates the logical value "1", as shown in FIG. 16A, the switching means 30.sub.1 becomes the ON state, and the pixel data of the pixel p(i,1) is stored in the memory 31.sub.1. At this time, the pointer data S26.sub.2 to S26.sub.n indicate a logical value "0", and the switching means 30.sub.2 to 30.sub.n become an OFF state.
Next, the pointer data S26.sub.2 indicates the logical value "1", as shown in FIG. 16B, the switching means 30.sub.2 becomes the ON state, and the pixel data of the pixel p(i,2) is stored in the memory 31.sub.2. At this time, the pointer data S26.sub.1 and S26.sub.3 to S26.sub.n indicate the logical value "0", and the switching means 30.sub.1 and 30.sub.3 to 30.sub.n become the OFF state.
Further, the pointer data S26.sub.3 to S26.sub.n-1 sequentially indicate the logical value "1". After going through similar processing, the pixel data of pixels p(i,3) to p(i,n-1) are respectively sequentially stored in the memories 31.sub.3 to 31.sub.n-1.
Finally, the pointer data S26.sub.n indicates the logical value "1", as shown in FIG. 16C, the switching means 30.sub.n becomes the ON state, and the pixel data of the pixel p(i,n) is stored in the memory 31.sub.n.
Next, in the blanking duration, the switch control signal S9 shown in FIG. 15 becomes the logical value "1", the switching means 32.sub.1 to 32.sub.n simultaneously switch to the ON state, and the pixel data S6.sub.1 to S6.sub.n stored in the memories 31.sub.1 to 31.sub.n are output to the processor elements 3.sub.1 to 3.sub.n in parallel.
The data output register 104 will be explained next in detail.
As shown in FIG. 17, the data output register 104 is constituted by a pointer circuit 109 and a conversion circuit 10.
The pointer circuit 109 is structured with the unit delay elements 37.sub.1 to 37.sub.n such as D-type flip-flops connected in series, receives as its input a pointer control signal S3 comprised by a clock signal S31 and pointer data S32.sub.1 and outputs pointer data S37.sub.1 to S37.sub.n to the conversion circuit 10.
The pointer circuit 109 is constituted by using a shift register widely used at the time of mutual conversion of serial data and parallel data.
The conversion circuit 10 has arranged in parallel switching means 35.sub.1 to 35.sub.n, memories 36.sub.1 to 36.sub.n, and switching means 38.sub.1 to 38.sub.n, receives as its inputs parallel data S3 comprised of data S3.sub.1 to S3.sub.n, pointer data S37.sub.1 to S37.sub.n and the switch control signal S39, and outputs the serial data S.sub.OUT.
The operation of the data output register 104 will be explained next by referring to FIG. 17 and FIGS. 18A to 18C. In the processor elements 3.sub.1 to 3.sub.n shown in FIG. 11, when the processing of the pixel data is terminated, in the horizontal blanking duration, the switch control signal S39 becomes the logical value "1", and the switching means 38.sub.1 to 38.sub.n become the ON state. By this, the pixel data processed in the processor elements 3.sub.1 to 3.sub.n are simultaneously stored in the memories 36.sub.1 to 36.sub.n.
Among the pointer data S37.sub.1 to S37.sub.n from the pointer circuit 109 in the horizontal sweep duration, the switches 35.sub.1 to 35.sub.n receiving as their inputs the pointer data which have become the logical value "1" become the ON state, and the pixel data stored in the corresponding memories 36.sub.1 to 36.sub.n are output as the serial data S.sub.OUT.
If the logical value "1" is given to the pointer data S32 for only the first clock cycle of the horizontal sweep duration, and the pulse is given to the clock signal S31, the switching means 35.sub.1 to 35.sub.n sequentially become the ON state only for one clock cycle in synchronization with this, and the pixel data stored in the memories 36.sub.1 to 36.sub.n are output as the serial data S.sub.OUT.
Specifically, when the pulses of one line's worth of pixels are given to the clock signal S31.sub.1 the pixel data of the pixels q(i,1) to q(i,n) of for example the i-th line are output from the memories 36.sub.1 to 36.sub.n as the serial data S.sub.out. Note that the pointer circuit 105 of the data input register 102 and the pointer circuit 109 of the data output register 104 have the same circuit configuration.
As explained by referring to FIG. 15 and FIGS. 16A to 16C, the memory to which the pixel data are actually input in the conversion circuit 6 of the data input register 102 is only one of the memories 31.sub.1 to 31.sub.n corresponding to one of the pointer data S26.sub.1 to S26.sub.n having the logical value "1". The serial data input line for transmitting the serial data S.sub.IN is connected to all switches 30.sub.1 to 30.sub.n, therefore, due to an influence of a parasitic capacitance accompanying the switches 30.sub.1 to 30.sub.n, the power consumption of the data input register 102 is increased, and the operating speed is lowered.
Further, as explained by referring to FIG. 17 and FIGS. 18A to 18C, the memory to which the pixel data are actually input in the conversion circuit 10 of the data output register 104 is only one of the memories 36.sub.1 to 36.sub.n corresponding to one of the pointer data S37.sub.1 to S37.sub.n having the logical value "1". The serial data output line for transmitting the serial data S.sub.OUT is connected to all switches 35.sub.1 to 35.sub.n, therefore, due to the influence of the parasitic capacitance accompanying the switches 35.sub.1 to 35.sub.n, the power consumption of the data output register 104 is increased, and the operating speed is lowered.