1. Field of the Invention
The present invention relates generally to data processing, and more particularly to a method and apparatus for increasing the speed of processing data vectors in a digital signal processor or microprocessor without requiring vector registers or a large number of registers.
2. Description of the Related Art
A data vector is a series of data elements. The concept of vector processing has been incorporated into computing systems to provide high computational throughput for many applications by performing the same series of operations on each data element or pairs of data elements.
Typically, the vector processing loops required to perform the same series of operations on each data element or pairs of data elements dominate the amount of time required to process signal processing kernels. The time required to perform these vector processing loops has been decreased in a number of ways utilizing both hardware and software. For example, software techniques include unrolling the loops, using parallel issue including reordering the instructions, and software pipelining. In hardware, zero overhead looping, parallel execution (both superscalar and instruction indicated), post-address-modifying loads and stores, vector units and vector registers, and Very Long Instruction Word (VLIW) instructions that do several of the required operations in parallel have been implemented. Although these methods increase the speed of the vector processing, they either require extra code, make the required assembly code hard to read and understand, or require extra registers that are not used except for these vector operations.
One approach to exploiting the kind of parallelism inherent in vector processing is through the use of dynamic scheduling. Several dynamic scheduling techniques are known in the art, including superscalar, scoreboarding, and reservation stations. Reservation stations, in particular, address the problem of executing multiple iterations of a loop without changing the source code. Reservation stations work by eliminating false dependencies between the instructions of different loop iterations. When the instructions of a particular iteration are executed by a sequential issue machine, dependencies between the instructions within the iteration may block issuing of instructions in the next iteration, even though there are sufficient hardware resources and no dependencies between the current iteration and the next. Reservation stations allow an instruction to be issued and buffered at a functional unit for later execution. This frees the issue pipeline to process additional instructions and begin the next iteration before the current one is finished. Reservations stations, however, require additional hardware, are extremely complex, and make the execution time of the loop non-deterministic.
Thus, there exists a need for an apparatus and method that increases the speed of processing of data vectors by processing multiple data elements at the same time which is programmed with readable assembly language and does not require a lot of extra registers.