1. Field
This patent specification relates to a single instruction stream multiple data stream (SIMD) processor having capability of parallel data processing for a plurality of image data by a single instruction for processing operation.
2. Discussion of the Background
As in the recent developments of digital duplication machines, facsimile apparatuses, and other similar imaging systems, more attention has been focused on the improvement on image quality with increasing the number of picture elements and adopting color images. These recent trends result in the increase in the number of image data to be processed.
The processing of image data is carried out, in general, repeating a similar set of processing steps over image data of the entire picture elements. An SIMD type processor, therefore, has been of good use with its capability of parallel data processing regarding a plurality of image data by a single instruction.
These data processing may be achieved by, for example, using a plurality of operational units aligned. It is requisite for the data to be processed be transferred with a speed comparable to data processing speed; otherwise the performance of the processor may be dictated almost entirely by speed of data access.
In a conventional single instruction single data (SISD) type processor, the data to be processed are accessed sequentially from a memory by a program instruction from the processor, and its access speed is determined by bit width of the memory and transfer time. When this method of data access is adopted for an SIMD processor, the reduction is caused in performance of the processor, since the data access is carried out sequentially at the expense of the advantage of parallel processing in the processor.
The SIMD processor is therefore configured to access to data being processed by not an instruction from a processor, but direct access from an external memory data transfer unit to an internal input/output (I/O) register in the processor. That is, simultaneously with executing processing steps, the SIMD processor is capable of transferring various data, which are subsequently processed, from an externally provided memory to an I/O register, or which are already processed, from an I/O register to a memory through a memory data transfer unit. As a result, higher speeds is attained in data processing with the SIMD processor.
The process flow of the above noted data transfer is carried out between the processor and external memory, as follows: (1) The external memory data transfer unit transfers data to be processed to the I/O register. (2) The processor instructs to transfer the data to be processed, which are already transferred from the external memory and held in the I/O register, from the I/O register to a processing register, and subsequently initiates processing steps. (3) The processor executes the processing steps. During the execution, the external memory data transfer unit transfers data to be processed next to the I/O register. In addition, when any processed data (or resultant data) are already held in the 1/0 register, the external memory data transfer unit transfers the resultant data from the I/O register to the external memory. And (4) upon the completion of the processing steps, the processor transfers the resultant results to the I/O register.
Increased speeds of data processing is thus attained with the SIMD processor through the above-mentioned steps of data transfer with the external memory data transfer unit, which are carried out simultaneously with processing steps.
As the methods of data transfer, there exemplified are the shift register method and serial access memory method. In the shift register method, as disclosed in Japanese Laid-Open Patent Application No. 5-67203, the data held in a register is shifted successively by bit in synchronous with input clock signals.
In the shift register method, the data which are firstly transferred are held in the register of the zero-th processor element, then shifted by one bit to be held in the register of the first processor element, and so on. Therefore, in the shift register method applied to an SIMD processor having, for example, 256 processor elements, 256 clock signals are necessary before the first transferred data be transferred to the register in the 255-th processor element.
In the serial access memory method, as disclosed in Japanese Laid-Open Patent Application No. 6-4690, an input pointer generates an input pointer signal represented by a logic xe2x80x9cHxe2x80x9d, then input the data into the input serial access memory (SAM) of the processor element designated by xe2x80x9cHxe2x80x9d. In this method, the input pointer signal is shifted successively by bit in synchronous with input clock signals.
In addition, during the first data transfer in the serial access memory method, the input pointer signal addresses the zero-th processor element, then data are held in the input SAM of the zero-th processor element. Subsequently, during the second data transfer, the input pointer signal, in synchronous with input clock signals, addresses the first processor element, then data are held in the input SAM of the first processor element.
Therefore, in the case of the serial access memory method applied to an SIMD processor having, for example, 256 processor elements, 256 clock signals are to be input before the data be inputted into the input SAM of the 255-th processor element.
There have been noticed several shortcomings in these methods such as, for example, data have to be transferred also to odd-numbered processor elements, even when the data are transferred only to even-numbered processor elements. Also, data have to be transferred to all of the processor elements, when the data are to be transferred only to the last half (from 128th to 255th) of the elements.
That is, the data cannot necessarily be transferred directly to specific processor elements by these methods. As result, it takes unduly long time for the data transfer, to thereby result in the reduction in data processing speeds.
In addition, in data processing with processors, in general, the bit widths vary depending on applications executed, such as widths of, an input register necessary for holding input data, an output register necessary for holding output data, and a register necessary for temporarily holding data.
The bit widths of, an input register, an output register and a register for temporarily holding data, have been fixed for previously known SIMD processors. When the bit width of the data exceeds that of the processor, therefore, processing operation of data become unfeasible, thereby resulting another disadvantage.
Further, since the bit widths are same for input and output registers for previous processors, it is necessary to access as many times as the number of the processor elements to transfer the data in all processor elements, to thereby cause still another drawback.
In addition, another drawback is noted regarding the number of line buffers, which follows. When a large number of line buffers are required for some of applications, registers contained in processor elements are utilized as the line buffers. However, since the bit width of these registers are fixed in the previously known processors, processing operation becomes unfeasible for the data, for which the bit width exceeds that of the registers.
Accordingly, it is an object of the present disclosure to provide an improved processor capable of transferring data directly to a specific processor element, thereby achieving higher speeds of data transfer and resultant data processing, and making flexible use of registers to thereby attain efficient data processing utilizing an arbitrary combination of the registers depending on the bit width of data.
The following brief description is a synopsis of only selected features and attributes of the present disclosure. A more complete description thereof is found below in the section entitled xe2x80x9cDescription of the Preferred Embodimentsxe2x80x9d.
A single instruction stream multiple data stream (SIMD) processor disclosed herein includes a plurality of processor elements each having a processing unit for data processing, and a data holding unit for holding data which are either to be processed or already processed by the processing unit, a data transfer bus interconnecting the processor elements, and an addressing unit for addressing a specific processor element. The data holding unit of the predetermined processor element addressed by the addressing unit carries out either acquiring or outputting data by way of the data transfer bus. The data holding unit may further be formed to include a first data holding unit for holding data to be processed and a second data holding unit for holding data already processed by the processing unit.