1. Field of the Invention
This invention relates in general to digital signal processing, and more particularly to a method and apparatus for providing a processor based nested form polynomial engine.
2. Description of Related Art
A microprocessor is a circuit that combines the instruction-handling, arithmetic, and logical operations of a computer on a single semiconductor integrated circuit. Microprocessors can be grouped into two general classes, namely general-purpose microprocessors and special-purpose microprocessors. General-purpose microprocessors are designed to be programmable by the user to perform any of a wide range of tasks, and are therefore often used as the central processing unit (CPU) in equipment such as personal computers.
In contrast, special-purpose microprocessors are designed to provide performance improvement for specific predetermined arithmetic and logical functions for which the user intends to use the microprocessor. By knowing the primary function of the microprocessor, the designer can structure the microprocessor architecture in such a manner that the performance of the specific function by the special-purpose microprocessor greatly exceeds the performance of the same function by a general-purpose microprocessor regardless of the program implemented by the user.
One such function that can be performed by a special-purpose microprocessor at a greatly improved rate is digital signal processing. Digital signal processing generally involves the representation, transmission, and manipulation of signals, using numerical techniques and a type of special-purpose microprocessor known as a digital signal processor (DSP). Digital signal processing typically requires the manipulation of large volumes of data, and a digital signal processor is optimized to efficiently perform the intensive computation and memory access operations associated with this data manipulation. For example, computations for evaluating polynomials include to a large degree repetitive operations such as multiply-and-add and multiple-bit-shift. DSPs can be specifically adapted for these repetitive functions, and provide a substantial performance improvement over general-purpose microprocessors in, for example, real-time applications such as image, speech, video and data processing.
DSPs are central to the operation of many of today's electronic products, such as high-density disk drives, digital cellular phones, complex audio and video equipment and automotive systems. The demands placed upon DSPs in these and other applications continue to grow as consumers seek increased performance from their digital products, and as the convergence of the communications, computer and consumer industries creates completely new digital products. In addition, digital systems designed on a single integrated circuit are referred to as an application specific integrated circuit (ASIC). Currently, the design of ASICs include complex digital systems implemented on a single chip, e.g. SRAMs, FIFOs, register files, RAMs, ROMs, universal asynchronous receiver-transmitters (UARTs), programmable logic arrays, field programmable gate arrays and other such logic circuits.
Designers have succeeded in increasing the performance of DSPs, and microprocessors in general, by increasing clock speeds, by removing data processing bottlenecks in circuit architecture, by incorporating multiple execution units on a single processor circuit, and by developing optimizing compilers that schedule operations to be executed by the processor in an efficient manner. For example, a DSP generally has a specialized multiply-accumulate (MAC) unit in order to improve the performance of repetitive digital signal processing algorithms. The increasing demands of technology and the marketplace make desirable even further structural and process improvements in processing devices, application systems and methods of operation and manufacture.
In algebra, a polynomial function, or polynomial for short, is a function of the form:f(x)=anxn+an−1xn−1+ . . . +a1x+a0,where x is a scalar-valued variable, n is a nonnegative integer, and a0, . . . , an are fixed scalars, called the coefficients of the polynomial f(x). Polynomial functions, or polynomials, are an important class of simple and smooth functions. Simple means they are constructed using only multiplication and addition. Smooth means they are infinitely differentiable, i.e., they have derivatives of all finite orders. Because of their simple structure polynomials are very easy to evaluate and are used extensively in numerical analysis for polynomial interpolation or to numerically integrate more complex functions.
In a polynomial as described above, the highest occurring power of x (n if the coefficient an is not zero) is called the degree of f(x); its coefficient is called the leading coefficient. Where the leading coefficient is 1, we describe the polynomial as monic. a0 is called the constant coefficient of f(x). Each summand of the polynomial of the form akxk is called a term. Here the variable x is, properly speaking, an indeterminate; it is on occasion replaced by something other than a scalar, e.g., some matrix or operator.
A root or zero of the polynomial f(x) is a number r such that f(r)=0. Determining the roots of polynomials, or “solving algebraic equations”, is among the oldest problems in mathematics. Some polynomials, such as f(x)=x2+1, do not have any roots among the real numbers.
Approximations for the real roots of a given polynomial can be found using Newton's method, or more efficiently using Laguerre's method, which employs complex arithmetic and can locate all complex roots. There is a difference between approximating roots and finding concrete closed formulas for them. Formulas for the roots of polynomials of degree up to 4 have been known since the sixteenth century. However, formulas for degree 5 polynomials are much difficult to obtain.
A digital signal processor (DSP) is a specialized microprocessor designed specifically for digital signal processing generally in real-time. DSPs can also be used to perform general-purpose computation, but they are not optimized for this function. Rather than general computations, DSPs usually have an instruction set (ISA) optimized for the task of rapid signal processing, such as the multiply-accumulate function.
An instruction set, or instruction set architecture (ISA), is a specification detailing the commands that a computer's CPU should be able to understand and execute, or the set of all commands implemented by a particular CPU design. The term describes the aspects of a computer or microprocessor typically visible to a programmer, including the native data types, instructions, registers, memory architecture, interrupt and fault system, and external I/O (if any). “Instruction set architecture” is sometimes used to distinguish this set of characteristics from the Micro-Architecture, which are the elements and techniques used to implement the ISA, e.g. microcode, pipelining, cache systems, etc.
The multiply-accumulate operation computes a product and adds it to an accumulator. In a CPU, an accumulator is a register in which intermediate results are stored. Without an accumulator, it would be necessary to write the result of each calculation (addition, multiplication, shift, etc.) to main memory and read them back. Access to main memory is slower than access to the accumulator, which usually has direct paths to and from the arithmetic logic unit (ALU). However, computing polynomials of single variables can be time consuming because of the number of cycles required and sizeable because of the number of bytes required to write code.
For example, consider the 3rd order polynomial f(x)=ax3+bx2+cx1+d. To evaluate the polynomial, i.e., to solve for f(x)=y for a given x. When using the monomial form of the polynomial, n additions and n2+n/2 multiplications are needed for the calculation of p(x). To increase the speed of evaluating the polynomial, the number of multiplications must be decreased because multiplications are slow and numerically instable compared to the additions. The Horner algorithm rearranges the polynomial into the recursive form x(c+x(b+x(a)))+d. This form is more suited to fast computation because there are no wasted stores of x2 and x3. For polynomials that could take on n orders there are two possibilities for writing this code. A hard coded form could be explicitly coded as follows:y=(cn*x+cn−1)16, y=(y*x+cn2)16, y=(y*x+cn3), . . . , y=y*x+c0.However, such a code would get costly as the order grows.
The second possibility is to use the loop form. If in loop form, the code could be expressed in pseudo c as:for (int i=n−1; y=c[n]x; i>=0; i−−) {y=(y*x+c[i])16}.
Still, this form is expensive in terms of cycles because i must be tested and a conditional branch back to the top of the loop must occur.
It can be seen then that there is a need for a method and apparatus for providing a processor based nested form polynomial engine.