1. Field of the Invention
The present invention relates generally to computer architecture and methods for increasing the speed at which mathematical operations are performed in a processor. More particularly, the present invention relates to a dedicated computer architecture and related method for dynamically updating operand addresses during execution of mathematical operations to improve the speed of executing mathematical operations in a processor.
2. Description of the Related Art
The advent of the electronic computer in the United States came in response to the U.S. Army's needs in World War II. The Army needed to quickly compute range tables for its heavy artillery and so it began researching ways to speed its computations. This led to the construction of the ENIAC computer in 1946, too late to be of help to the Allies in World War II, but ushering in the infancy of computers in the United States. Since then, many breakthroughs in computer speed and power have been achieved, permitting the amount and complexity of information processed by computers to greatly increase. Despite these great advances, presently there is an acute need to process even greater quantities of data at still higher speeds.
For example, one modem application that is particularly data intensive is computer graphics. The processing of still images or video consumes prodigious amounts of storage space within the computer. Lossy image compression techniques often are used in an attempt to reduce the amount of storage required by eliminating unnecessary features or portions of the image. Part of the lossy image compression technique involves selecting the portions of the image that can be eliminated without causing a perceptible difference in image quality. Determining the portions to eliminate typically is accomplished by transforming the image into its mathematical equivalent. Once the image is transformed in this fashion, high frequency components are deleted based upon the concept that deletion of high frequency coefficients affect the appearance of an image significantly less than the deletion of low frequency coefficients.
Modern-day image compression relies on a transform function known as the discrete cosine transform (DCT) to eliminate portions of the image. The DCT compression is used in techniques such as JPEG (Joint Photographic Experts Group) and MPEG (Motion Picture Experts Group) to eliminate unnecessary aspects of images. The DCT compression also is used in picture phone transmission and has been proposed in high definition television standards. However, the main drawback of using the DCT for image compression instead of techniques such as the DFT (Discrete Fourier Transform) is the amount of required arithmetic. Although fewer DCT coefficients are required than are required in a corresponding DFT implementation, the DCT technique requires far more mathematical operations to execute than does the DFT technique. Thus, the additional mathematical operations required by the technique effectively slow compression and decompression of an image from its spatial representation to its mathematical equivalent and vice versa. Image compression is just one critical example of a mathematically intensive graphics operation. Mathematically intensive operations are present in other signal processing techniques as well as in other fields.
Mathematical operations typically require a relatively large amount of system resources to process. For instance, the multiplication of two matrices to yield a single, final matrix involves multiplying each element of row 1 of the first matrix times each element of column 1 of the second matrix. The addition of these multiplied numbers yields a single number corresponding to row 1, column 1 of the final matrix. In an eight-by-eight matrix, this sequence must be repeated sixty-four times.
Further, each multiplication executed by a computer involves multiple steps. First, two multiplicands must be retrieved from memory. If each multiplicand were located at the address supplied by a program or register, this would occupy one cycle per multiplicand. This storage scheme is commonly referred to as direct addressing. If, instead, indirect addressing were utilized, one or more additional steps per operand would be required. In indirect or multilevel addressing, the address supplied by a program or register is that of a storage location containing a second address, with the second address identifying the ultimate location of the desired data. Alternately, the second address may be the first field of a chained list of address files, the last of which identifies the ultimate location of the desired multiplicand.
Thus, in indirect addressing the memory location corresponding to the address provided by a computer program would merely contain another address, often referred to as the effective address. The effective address is the address identifying an address where the desired operand is contained. When indirect addressing is implemented, retrieving each operand takes at least two steps which corresponds to two or more clock cycles of computer time. Despite the overhead associated with indirect addressing, it is prevalent today, especially in RISC (Reduced Instruction Set Computing) architectures because it provides a degree of flexibility not available with direct addressing. For instance, indirect addressing facilitates sharing of data between several users or several programs where the user does not know the exact address of the shared information during program assembly.
In any case, once the multiplicands have been retrieved, they must be multiplied and stored in an accumulator. The result may be immediately written to a destination address for final storage, in which case one more address needs to be supplied for the accumulator address. In the case of a matrix multiply, this multiplication and storage process is repeated until the matrix multiplication process is complete. Matrix multiplication, therefore, obviously requires a large number of clock cycles for the various operations to be performed. Thus, applications such as computer graphics data compression using the DCT compression technique are slow and unwieldy. Further, while the processor is retrieving, multiplying, and storing, it is precluded from other activities, thereby limiting system performance.
The above matrix multiplication example is not the only mathematical operation that affects computer performance. Even simpler operations such as multiple adds can take prodigious amounts of computer time, if enough additions are involved. Each addition operation involves retrieving two operands from memory and sending them to an adder. Subsequently, the sum must be stored in a register or memory. At one cycle per instruction, the processor time dedicated to adding operands becomes substantial.
The computer industry recognized that mathematical operations are a drag on system performance and has had some success in increasing the speed of mathematical operations executed by a processor. As is well known, the broad idea of pipelining information has been explored. Pipelining is the term used when hardware is built from several functional units placed in a sequential order. Doing work in parallel is also known to speed the execution of mathematical operations when the work can be divided. Executing operations in parallel refers to dividing work into several independent or near-independent units and executing or solving them simultaneously, usually with different processors. Even though these techniques are known and have been used, mathematical operations still are executed relatively slowly.
Therefore, a need exists for a technique to speed the execution of mathematical operations in a computer. Preferably, such a method would not depend on whether direct or indirect addressing was employed in the processor, but would serve to provide a performance boost in either situation. The ideal method or device also would be flexible in that it would not be limited to one type of mathematical function, but also could quickly execute other functions or operands, at approximately the same speed.