1. Field
The present invention relates generally to computer systems and, more specifically, to processor architectures.
2. Description
Some processors are designed to provide extensions to their instruction set architecture (ISA) for multimedia operations. For example, the MMX(trademark) instructions supported by Pentium(copyright) II, Pentium(copyright) III, and Celeron(trademark) processors, commercially available from Intel Corporation of Santa Clara, Calif., implement various functions useful for multimedia applications, such as digital signal processing, and audio and video processing. These instructions support xe2x80x9csingle instruction multiple dataxe2x80x9d (SIMD) operations on multimedia and communications data types. Although the use of these instructions provide an improvement over combinations of pre-existing instructions to perform a given function, and individual MMX(trademark) instructions are efficient for some types of processing, various impediments to faster multimedia processing still remain. For example, many implementations of block-based image and video processing algorithms (such as joint photographic experts group (JPEG) and moving picture expert group (MPEG) schemes) result in the data, stored in a set of registers accessible as operands for the MMX(trademark) instructions, being transposed during matrix mathematical operations. The transposition of data among registers incurs significant overhead, however, thereby slowing overall processor throughput for multimedia processing. Therefore, any techniques for avoiding or minimizing these delays would be a valuable advance in the processor art.
An embodiment of the present invention is a processor having a first set of registers, the first set storing a matrix of data, and a second set of registers coupled to the first set, the second set storing a transposed copy of the matrix of data.
Another embodiment of the present invention is a method of using two sets of registers for matrix processing by a processor. The method includes storing a matrix of data into a first set of registers, the first set of registers having a first number of registers, each register comprising a first number of storage units, each storage unit storing an element of the matrix, and transposing the matrix of data into a second set of registers, the second set of registers having a second number of registers, each register comprising a second number of storage units. The method also includes referencing one of the first set of registers to operate on a row of the matrix of data, and referencing one of the second set of registers to operate on a column of the matrix of data.