Processors that perform image or video processing often need to manipulate arrays of image data. In particular, a typical representation of an image uses one or more arrays of pixel values or transformed coefficients in time or frequency domain. These arrays are commonly manipulated for encoding or decoding of compressed image data and for filtering or changing of the image.
Matrix operations are common techniques used in image processing. For example, a DCT transformation, which transforms the image array from space domain to frequency domain, normally requires a transpose operation after a one-dimensional DCT operation. Conventionally, a transpose operation requires reading individual values from an array and storing the values in the correct positions in a transposed array. The number of cycles require to transpose an array generally depends on the number of values in the array.
Another common operation in image processing is filtering. For example, a linear filtering operation accesses a series of values, multiplies the values by respective filter coefficient, and sums the resulting products. Some video or image processors provide parallel data processing that permits parallel multiplications of multiple values and respective filter coefficients. However, memories and register files in such systems generally only allow simultaneous access to a set of values aligned along a particular direction, i.e., along a row of the array. Accordingly, a single instruction can access multiple values for a horizontal filtering operation, but vertical filtering requires either transposing the array being filtered or performing separate access operations for each value in a different row.
In view of these limitations of current image and video processors, more efficient architectures and methods for performing transpose and other array manipulations are desired.