1. Field of the Invention
The present invention relates to the type of processors which are required for fast processing of computation-intensive algorithms such as, for example, 2-dimensional convolution, Gabor transformation, Gaussian or Laplacian pyramids, block matching, DCT, MPEG2, etc.
2. Description of the Prior Art
The journal Design and Electronic 12 of 06.13.1995, pages 30 to 35, discloses for this purpose an arrangement in which the signal processing algorithms which are critical in terms of computation time are processed by tailor-made programmable special processors. In such processes, the registers are supplied via an on-chip memory and a complicated crossbar switch ensures optimum communication between the on-chip memory and the processors. Disadvantages here are the relatively high on-chip storage requirement and the computing power which is too low for many algorithms because of the small number of parallel arithmetic units. When more than four parallel signal processors are used, the communications outlay and, thus, the chip area are increased more than proportionally in the process.
Furthermore, the publication Microprocessor Report, The Insider's Guide to Microprocessor Hardware, Volume 8, No. 13, Oct. 3, 1994, pages 5 to 9, discloses an extensively pipelined superscalar 64 bit RISC processor having two integer and three floating point units, which is expanded by two graphics units, namely an addition/subtraction unit and a multiplication unit for parallel integer calculations. On account of the limited number of register ports, only two floating point or graphics instructions can be processed simultaneously. This means that the computing power is inadequate for many image-processing requirements.
Similarly, Microprocessor Report, Dec. 6, 1994, pages 12 to 15, discloses a processor in which partly different execution units are provided, up to five of which can be addressed simultaneously by each instruction. A disadvantage here is a relatively complicated compiler which takes account of all the latencies of the processor and ensures that the parallel instructions utilize the hardware optimally and in a manner free from conflict.
Proceedings of the Conference on Visual Communication and Image Processing (VCIP'94), Chicago, 1994, pp. 1753 to 1765 and Proceedings of the IEEE 1993, Custom Integrated Circuit Conference, San Diego, Calif., May 9 to 12, 1993, pages 4.6.1 to 4.6.3 discloses extensively parallel one-dimensional SIMD processor arrays having a local memory and high data rates between the local memory and the processor element. In such processes, complicated operations are assembled from individual operations wherein the longer execution time of these compound operations is compensated for by a high number of processor elements. In the first case, a global matrix memory is present which permits the distribution of two-dimensional image sections to the individual processor elements. In the second case, a global communications capability permits the multiple utilization of data loaded once and reduces the required wiring bandwidth between processor array and external storage devices. Disadvantages here are difficult programming, complicated control for pipelined processor elements, the low frequency in the case of non-pipelined processor elements and a large on-chip memory.