1. Field of the Invention
The present invention relates to the field of digital signal processing (DSP). More specifically, the present invention relates to the optimization of floating point operations in DSP systems
2. Prior Art
Many of today's computer applications require the processing of both integer and non-integer values. In modem computer systems, the predominant non-integer representation of numbers is the floating point representation as defined by the IEEE floating point standard (IEEE 754-1985). Accordingly, modem computer systems are usually dual-instruction machines with an integer datapath and a floating point datapath for handling integer operations and floating point operations, respectively.
Floating point operations are typically performed by a floating point unit in a processor, which is coupled to the floating point datapath of a computer system. Importantly, a floating point number is generally stored in normalized format in a floating point register (FPR) coupled to the floating point datapath. When a floating point operation needs to be performed, the floating point operands are converted from normalized format to unnormalized format before they are operated upon. Thereafter, the result is converted back to normalized format before it is stored back into the FPR. The FPR always stores normalized formatted data.
More particularly, in the field of digital signal processing (DSP), floating point representation is frequently used to provide the appropriate range and precision required in sums of products computations. For example, audio applications such as filtering and wave table synthesis, and graphic applications such as clipping, matrix transformation, vertex calculation and texture processing all involve large amounts of sums of products calculations. Thus, an efficient implementation of floating point operations is essential to computer systems running DSP applications.
In one prior art method, a floating point multiply and add instruction is used in conjunction with the FPR to implement sums of products computations. Specifically, in each multiply and add instruction, two floating point numbers are first multiplied to determine their product. The product is then added to a running total in the FPR, which is an intermediate result representing the sum of previously calculated products, to derive the sum of the newly calculated product and the current running total. This sum is then normalized and stored back into the FPR as the new running total. The multiply and add instruction is repeated for each iteration of the sum of products computation at hand.
However, this prior art method is inefficient because performing a normalization is expensive both in terms of computational time and area requirement. Under this prior art method, performing a sum of products calculation requires a normalization for every intermediate sum, thus the time and area requirements increase as the number of iterations becomes larger. Further, since typical DSP applications involve frequent sums of products calculations, the performance delivered by this prior art method further deteriorates. Thus, this prior art method is far from ideal for performing floating point operations, especially for DSP applications in which extensive iterations are commonplace.
Moreover, due to the need of the normalization per sum under this prior art method, there is a latency of typically four instruction cycles for the floating point multiply and add instruction. In other words, the result of an instruction is not available for use by a subsequent instruction until four instruction cycles later. This latency imposes significant restrictions on the design and implementation of pipelined instructions and adversely affects the overall performance of floating point operations provided by this prior art method. In addition, efficient implementations of today's sophisticated DSP applications frequently require a one-cycle turnaround, or in other words, that the result of an instruction be available in the next instruction cycle. The instant prior art method is therefore not well suited to the implementation of modem high speed DSP applications.
Another prior art method involves the use of integer accumulators to implement sums of products calculations. By using integer representations in the calculations, this prior art method avoids the normalization required in the prior art method using floating point operations described above, and thus does not present the inefficiency inherent therein. Further, by using an integer accumulator to store the running total during sums of products calculations, rather than storing such intermediate results into a register for every addition, this method actually offers an improvement in performance. However, integer representations cannot provide the range and precision offered by floating point representations. Since DSP applications increasingly call for larger range and higher precision, this prior art solution based on integer accumulators is inadequate to handle the demand of today's high performance DSP applications.
Thus, there exists a need for a method and system for performing floating point operations that is capable of providing the large range and high precision required by DSP applications and at the same time delivering such capability without sacrificing computational efficiency and overall performance. Furthermore, there exists a need for a method and system for performing floating point operations which does not incur significant latency in the instruction pipeline in order to meet the ever-increasing performance demand of high performance DSP applications.