1. Field of the Invention
The present invention generally relates to a parallel processor used for digital data processing of image data or the like in a digital copier, a facsimile machine or the like, and, in particular, to a microprocessor used for non-linear processing of image data and employing an SIMD (Single Instruction-stream Multiple Data-stream) method in which same processing is performed on a plurality of sets of data by a same instruction, and an image processing apparatus employing the processor.
2. Description of the Related Art
Recently, in image processing in a digital copier, a facsimile machine or the like, improvement of image quality is rendered by increasing the number of pixels, providing color images, using variable types of image processing and so forth. As the image quality increases, the amount of image data to be processed increases, and the image processing method becomes complicated. In such a sort of image processing, there are many cases where same processing is performed on a plurality of sets of data. For this purpose, a processor in an SIMD method in which a plurality of sets of data is processed by a single instruction is used in many cases.
In such an SIMD processor, a plurality of processor elements (PE) each having an arithmetic and logic unit and a register file for enabling a plurality of sets of data to be processed at once are provided. Further, in order to control the entirety of the processor by a program, a global processor having a program interpreting part, a control part, an arithmetic and logic unit, registers, and memories is provided.
When data is transferred from the global processor to the processor elements, data shift is performed through shift registers provided to the respective processor elements and all connected to form a chain configuration together with the global processor connected to one end of the processor elements, or data is directly transferred via buses connected to the respective processor elements.
In the case where data shift is performed through the processor elements, shift should be made for all the processor elements. In the case where data is directly transferred through the buses, and data rewriting is performed, data is rewritten for all the processor elements, or specific processor elements are selected by selection signals from the control part, and rewriting is performed. Only one instruction cycle is needed for rewriting for one processor element. However, a plurality of instruction cycles are needed for a plurality of processor elements.
In a normal operation in a processor element, it is determined from an execution condition flag whether or not operation is to be performed. The execution condition flag is set/reset according to a result of operation performed by an operation array, or is directly set/reset by a control signal from the control part of the global processor.
In an SIMD processor in the related art, such data rewriting is made by using an operation result, or setting/resetting is performed by transfer of data to the execution condition flags for all the processor elements. However, in a case where only specific processor elements need operation, for example, only processor elements in a certain range need operation, or only every n-th processor element (n=1, 2, 3, . . . ) thereof needs operation, it is difficult to make setting of the execution condition flags only for relevant processor elements. For this purpose, the execution condition flags are set as a result of data being set different between relevant processor elements and the other processor elements intentionally, or setting of the execution condition flags being made for the relevant processor elements one by one.
Further, as the amount of image data to be processed increases and the data processing method becomes complicated, the amount of data to be processed at once increases, and the number of processor elements needed increases. When the number of processor elements increases, the number of test patterns needed for logical testing, IC testing and so forth increases. In order to perform a test performed on a single processor element on all the processor elements, the number of test patterns the same as the number of the processor elements are needed. Further, it is necessary to provide circuits needed for testing and ports through which test results are output for all the processor elements.
As mentioned above, in the SIMD processor, it is possible to perform same operation processing on a plurality of sets of data by a single instruction simultaneously. In normal operation processing, such a method is rendered through a plurality of arithmetic and logic units provided in parallel through which same operation is performed on a plurality of sets of data simultaneously. However, in image processing, non-linear processing may be performed in which operation processing cannot be expressed by formulas. In such non-linear processing, an operation formula is changed according to data for which operation is performed. Accordingly, it is not possible to perform same processing simultaneously. Thereby, data is processed one by one, and, as a result, the advantages of the SIMD processor cannot be utilized.
In the normal SIMD processor, for performing non-linear processing in which an operation formula is changed according to operation data, in order to prevent a software program from becoming very complicated, such a method is common that, all possible data to be obtained through operation is previously obtained for operation data, and a table is previously formed therefrom, and, then, the table is used to convert given operation data into data to be obtained through the operation. Specifically, the table is stored in a RAM, data to undergo operation is added to the top address of the table, and, thus-obtained value is used as an address pointer, and, thereby, data to be obtained through the operation is obtained from the RAM.
In a case where operation data is of 8 bits, the conversion table of 256 bytes is needed. Accordingly, as the bit width of the operation data increases, the size of the conversion table increases by the power of two. Therefore, when the bit width of operation data is large, the operation data is divided into arbitrary sections, and an approximate formula for each section is prepared as a table.
In a case where such table conversion is employed in an SIMD processor, the table is needed for each unit of operation. For example, when the SIMD processor includes 256 processor elements (PEs) and performs table conversion of 8 bits, a table RAM of 256 bytes is needed for each operation unit. Accordingly, a total of 256 table RAMs are needed. Accordingly, the SIMD processor is very expensive. In order to solve this problem, various methods have been proposed.
Japanese Laid-Open Patent Application No. 5-67203 discloses that data used for operation is output from an output register built in a PE in each SIMD unit, table conversion is performed one by one externally, and the result of table conversion is input to an input register built in the PE one by one. In this method, only the single conversion table is needed. Accordingly, it is possible to prevent the cost from increasing. However, because data is processed one by one, the operation processing time amounts for at least the number of PEs. Accordingly, the operation speed may be problematically low. Further, when this processing is performed in parallel to normal processing in PE, the operation processing time can be reduced at a total. However, input/output registers are used for the conversion operations, and cannot be used for the other purposes. Accordingly, when the data resulting from the conversion is used for the normal processing, it is necessary to wait for the conversion time. Thereby, parallel processing cannot be rendered. Further, a special table memory is needed, and input shift register and output shift register are used specially for the table conversion.
Japanese Laid-Open Patent Application No. 9-305550 discloses that a comparator for comparing original data of non-linear conversion table with data to be converted is provided for each PE, the comparator compares both data, data obtained through conversion is stored in the PE for which the comparison result is coincidence, and the data is used as data obtained through the operation. In this method, the operation processing time amounts for the number of combinations of values which data used for operation can have (the number of words of the conversion table). Accordingly, it is possible to improve the processing speed in a case where the number of words is smaller than the number of PEs. In a case of 8-bit data, the number of cycles amounts to on the order of 256 times regardless of the number of PEs. Also in this case, the operation processing time is long. Further, in a case where this processing is performed in parallel with other processing, this method has the same problem as that in the method of Japanese Laid-Open Patent Application No. 5-67203. Further, the special comparator is needed.
Japanese Patent No. 2812292 discloses that data used for operation is given as an address pointer by each PE to a RAM for a conversion table having output ports, the number of which is the same as the number of PEs, and, therefrom, data to be obtained through the operation is obtained. In this method, the operation processing time amounts to the order of one cycle. However, increase of the number of output ports results in increase of costs. In particular, it is not possible to render the RAM having the ports, the number of which exceeds tens. Accordingly, this method cannot be applied to an SIMD processor having a large number of PEs.
Thus, in the related arts, the various methods have been proposed for performing parallel processing, which is the main feature of the SIMD processor but is difficult to be rendered for non-linear processing such as that table conversion is needed to be performed therefor. However, these methods are those using the input/output registers, using the special comparator, and using the special table memory, and have problems in processing speed or in the costs, as described above