This invention relates in general to signal processing and more specifically to Single Instruction Multiple Data (SIMD) coprocessor architectures providing for faster image and video signal processing, including one and two dimensional filtering, transforms, and other common tasks.
A problem which has arisen in image processing technology is that two-dimensional (2-D) filtering has a different addressing pattern than one dimensional (1-D) filtering. Previous DSP processors and coprocessors, designed for 1-D, may have to be modified to process 2-D video signals. The end desired goal is to enable a digital signal processor (DSP) or coprocessor to perform image and video processing expediently. In image processing, the most useful operation is 1-D and 2-D filtering, which requires addressing the 2-D data and 1-D or 2-D convolution coefficients. When the convolution coefficients are symmetrical, architecture that makes use of the symmetry can reduce computation time roughly in half. The primary bottleneck identified for most video encoding algorithms is that of motion estimation. The problem of motion estimation may be addressed by first convolving an image with a kernel to reduce it into lower resolution images. These images are then reconvolved with the same kernel to produce even lower resolution images. The sum of absolute differences may then be computed within a search window at each level to determine the best matching subimage for a subimage in the previous frame. Once the best match is found at lower resolution, the search is repeated within the corresponding neighborhood at higher resolutions.
In view of the above, a need to produce an architecture capable of performing the 1-D/2-D filtering, preferably symmetrical filtering as well, and the sum of absolute differences with equal efficiency has been generated. Previously, specialized hardware or general purpose DSPs were used to perform the operations of summing of absolute differences and symmetric filtering in SIMD coprocessor architectures. Intel""s MMX technology is similar in concept although much more general purpose. Copending applications filed on Feb. 4, 1998, titled xe2x80x9cReconfigurable Multiply-accumulate Hardware Co-processor Unitxe2x80x9d, Provisional Application No. 60/073,668 now U.S. Pat. No. 6,298,366 and xe2x80x9cDSP with Efficiently Connected Hardware Coprocessorxe2x80x9d, Provisional Application No. 60/073,641 now U.S. Pat. No. 6,256,724 embody host processor/coprocessor interface and efficient Finite Impulse Response/Fast Fourier Transform (FIR/FFT) filtering implementations that this invention is extending to several other functions.
The proposed architecture is integrated onto a Digital Signal Processor (DSP) as a coprocessor to assist in the computation of sum of absolute differences, symmetrical row/column Finite Impulse Response (FIR) filtering with a downsampling (or upsampling) option, row/column Discrete Cosine Transform (DCT)/Inverse Discrete Cosine Transform (IDCT), and generic algebraic functions. The architecture is called IPP, which stands for image processing peripheral, and consists of 8 multiply-accumulate hardware units connected in parallel and routed and multiplexed together. With the inputs of the parameters to a dedicated hardware IMX/IPP structure, a nested xe2x80x9cforxe2x80x9d loop with programmable iteration counts performs the operations used so commonly in image processing in a fraction of the clock cycles that it takes to accomplish the same operation in software. Accumulator initialization and write-out are controlled by programmable conditions on the loop variables, where the loop variables, or parameters, are input to dedicated registers, i.e. I1, I2, I3, I4. Input operands for the MAC units are fetched from memory in a regular and flexible fashion which allows for pattern programmable data fetching. Selected outputs from the MAC are automatically written into memory upon completion of an operation where the number of outputs available is a programmable feature of the hardware IPP coprocessor.