1. Field of the Invention
The present invention relates to an SIMD (Single Instruction-stream Multiple Data-stream) type microprocessor.
2. Description of the Related Art
In SIMD microprocessors, one instruction can simultaneously perform the same operation processing on a plurality of data. Because of this performance, such microprocessors are frequently utilized in processing such as image processing in which the same operations are performed on a lot of data.
In normal operation processing in an SIMD microprocessor, a plurality of operation units (Processor Elements (PE)) are arranged, and the same operations are simultaneously performed on a plurality of data. In this manner, high speed operation processing can be achieved.
In “image data processing”, a variety of filtering processing ways are performed in order to correct image quality. Normal filtering processing uses a weighted operation of an object pixel and pixels adjacent to horizontal scanning direction or vertical scanning direction, and therefore an SIMD type microprocessor is really suitable because of its simultaneous operation function. Although a normal weighting filter can be used for removing noise components of input image data, such kind of weighting filter has defect that contour portions are blurred.
A “median filter” is known as a useful tool for removing noises in input image data. This filter is frequently used in image processing because it can remove noise components with keeping contour portions clear.
This “median filter” performs such operation that an object pixel (or pixel at issue) and surrounding 8 pixels consisting of left pixel, right pixel, upper pixel, lower pixel, left upper pixel, right upper pixel, left lower pixel and right lower pixel) are lined in data magnitude order, and that the median pixel (placed at the center) replaces the object pixel.
In an SIMD type microprocessor, as shown in FIG. 5, pixel data are stored in PE registers (R2 registers in FIG. 5(B)) in horizontal scanning direction. In order to refer to or obtain pixel data in vertical scanning direction, line delay is required to be made by copying the present line pixel data to another registers (R1 registers in FIG. 5(B)) and storing the data in line buffers such as FIFO memory external to the SIMD type microprocessor. By repeating this operation several times, pixels can be referred to along a plurality of the vertical scanning directions. In FIG. 5(B), R1 registers are line-delayed to make R0 registers.
In median filter processing, it is necessary for each PE to take (or fix) the “median” relating to pixels surrounding its pixel itself. In FIG. 5(B), PE[5] is taken as an example. The object pixel datum is a pixel “E” stored in R1 register of PE[5]. Its left upper pixel datum is a pixel “A” stored in R0 register of PE[4]. Its left pixel datum is a pixel “D” stored in R1 register of PE[4]. Its left lower pixel datum is a pixel “G” stored in R2 register of PE[4]. Its right upper pixel datum is a pixel “C” stored in R0 register of PE[6]. Its right pixel datum is a pixel “F” stored in R1 register of PE[6]. Its right lower pixel datum is a pixel “I” stored in R2 register of PE[6]. The numbers affixed to PE and register numbers (R0, R1, R2) will be explained later.
The above nice pixel data “A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I” are sorted based on data magnitude to obtain the median (value at the center) that is the fifth largest pixel data. Prior art needed to repeat sort processing of pixel data many times. A method of easing such bulky sort processing or speeding up the processing has been desired.
There is a method of speeding up the median filter processing, that is not relating to an SIMD microprocessor. For example, Japanese Patent Laid-open No. 6-274617 deals “3×3” pixel median filter processing, in which sort processing is separated to three stages so that three pixels is first sorted, then six pixels are merge-sorted and then nine pixels are merge-sorted finally to speed up the processing. Another method is disclosed in Japanese Patent Laid-open No. 5-2645, in which an object data are divided into bit slices and the number of data having “1” are counted from significant bit to insignificant bit to obtain data for desired order. Both methods can effectively speed up the median processing in an SISD type processor or an image processing LSI.
However, the above methods can not be applied to image processing using an SIMD type microprocessor.
The method described in Japanese Patent Laid-open No. 6-274617 can speed up the total speed of the merge-sorting of six pixels and nine pixels for an SISD type microprocessor that can perform branch processing. However, this method slows down the processing time for an SIMD type microprocessor because that the branch processing can not be performed on PE by PE basis and all branched operations have to be done.
The method disclosed in Japanese Patent Laid-open No. 5-2645, divides an object data into bit slices and counts the number of data having “1” from significant bit to insignificant bit to obtain data for desired order. If this method is applied to an SIMD type microprocessor, because both pixel data on both sides of an object pixel are stored in adjacent PE registers, data should be extracted from adjacent PE registers in order to make data divided into bit slices, increasing the number of wiring. Further, if counters and adders for bit slices are provided for all bits of each PE, circuit size will be unrealistically increased. If processing is divided to perform for each bit, the cycle time will be drastically increased.
Japanese Patent Laid-open No. 11-149554discloses a median filter processing in an SIMD type microprocessor, in which “3×3 pixels” are calculated. First, three pixels are sorted in the “column” direction. Then the sorted three pixel data are sorted in the “row” direction. Lastly, sort is performed in the diagonal direction.
This method does not disclose the basic “three pixel sort processing”, and therefore the basic “three pixel sort processing” can not be performed with satisfactorily high speed. If the “three pixel sort processing” consumes long time, the total processing time is proportionally increased. Prior art SIMD type microprocessors perform the following method in order to sort the three pixels.
In the following method, three pixel data to be compared are stored in R0, R1 and R2 registers of each PE. Sorted data are stored in R16, R17 and R18 registers in descendent order. “MAX” instruction means an instruction to compare two source register contents and write back the bigger data into a designated register. “MIN” instruction means an instruction to compare two source register contents and write back the smaller data into a designated register.
1. Perform MAX operation on R0 and R1 and store the result in R16;
2. Perform MIN operation on R0 and R1 and store the result in R17;
3. Perform MIN operation on R2 and R17 and store the result in R18;
4. Perform MAX operation on R2 and R17 and store the result in R17;
5. Perform MAX operation on R16 and R17 and store the result in R16; and
6. Perform MIN operation on R16 and R17 and store the result in R17.
The above explained six cycles are needed.
In order to do “3×3 pixels” median filtering processing according to the method disclosed in Japanese Patent Laid-open No. 11-149554, the following cycles are needed.
(1) Sort the 3×3 pixel data in the “column” direction (three columns can be carried out simultaneously due to “SIMD”); six cycles
(2) Sort the 3×3 pixel data in the “row” direction; 10 cycles for three lines
(3) Sort the 3×3 pixel data in the diagonal direction; six cycle
22 cycles are needed in total. In (2) above, when obtaining the three 3×3 pixel's MIN and MAX, sorting is not necessary and 2 cycles are enough, which is shorter than 18 cycles required for sorting three lines.