1. Field of the Invention
The invention relates to the field of computer systems. More specifically, the invention relates to operations on complex numbers.
2. Background Information
Many devices in use today (e.g., modems, radar, TV, telephone, etc.) transmit data using in phase and out of phase signals (e.g., orthogonal signals). This data is typically processed using complex numbers (e.g., the real number is used for the in phase signal, while the imaginary number is used for the out of phase signal). The multiplication of two complex number (e.g., r.sub.1 i.sub.1 and r.sub.2 i.sub.2) is performed according to Equation 1 shown below. ##EQU1## The multiplication of complex numbers is required in operations such as, the multiply-accumulate operation (see Equation 2 below). In Equation 2, a(n) and b(n) represent the n.sup.th complex numbers in two series of complex numbers. EQU y(n)=y(n-1)+a(n)*b(n) Equation 2
Digital discrete time filters, such as a FIR filter and an IIR filter, require many multiply-accumulate operations. A FIR filter is an operation which is used in applications, such as real time digital signal processing applications (e.g., complex demodulation and equalization found in high speed data modems; ghost canceling in terrestrial broadcasting), for recovery of the transmitted information from the signal. The equation for the FIR filter is shown below as Equation 3. ##EQU2## With reference to Equation 3, the complex variable y(k) represents the current output sample of the filter, the input value c(n) represents the n.sup.th filter coefficient of the filter, the constant L is the number of coefficients in c(n), and the input value x(k-n) represents the n.sup.th past value of the input sequence (also termed as "samples"). The output of the filter is a weighted average of the past L complex samples. Typically, there are more samples than there are coefficients. For the computation of the k.sup.th output sample y(k), the first complex coefficient corresponds to the k.sup.th sample, the second corresponds to the (k-1).sup.th sample, and so on. Each complex coefficient is multiplied by the sample to which it corresponds, and these products are accumulated to generate the k.sup.th output sample of the filter. For the computation of the (k+1).sup.th output sample y(k+1), the first complex coefficient corresponds to the (k+1).sup.th sample, the second complex coefficient corresponds to the k.sup.th sample, and so on. Each complex coefficient is multiplied by the sample to which it corresponds, and these products are accumulated to generate the (k+1).sup.th output of the filter. Thus, the correspondence between the samples and the complex coefficients is slide up one for each successive output sample. As a result, FIR filters are typically coded using an outer and an inner loop. The outer loop steps through the successive outputs (the different corresponding relationships between the samples and complex coefficients), while the inner loop steps through the complex coefficients and current corresponding samples to perform the multiply-accumulate.
When a FIR filter is first begun, there are insufficient samples to compute the entire length (L) of the filter (i.e., index k-n into the input samples x() is negative). In such situations, the missing samples are typically substituted with zero, the first sample, or some other relevant input.
The equation for the IIR filter is shown below as Equation 4. ##EQU3## With reference to Equation 4, the input value d(i) represents the i.sup.th filter coefficient of the filter, and the constant M is the number of coefficients in d(i).
One prior art technique for supporting multiply-accumulate operations is to couple a separate digital signaling processor (DSP) to an existing general purpose processor (e.g., The Intel.RTM. 486 manufactured by Intel Corporation of Santa Clara, Calif.). The general purpose processor allocates jobs to the DSP.
One such prior art DSP is the TMS320C2x DSP manufactured by Texas Instruments, Inc. of Dallas, Tex.). A prior art method for performing a complex multiply-accumulate operation on this DSP is to perform the multiply and add operations to generate the real component and add that real component to an accumulation value representing the accumulated real component, and then perform the multiply and add operations to generate the imaginary component and add that imaginary component to an accumulation value representing the accumulated imaginary component. A pseudo code representation of the inner loop of the FIR filter is shown below in Table 1.
TABLE 1 ______________________________________ ZAC ; ACC &lt;= 0, other setup code to initialize pointers YRSTART ; Loop label LT *x++ ; T &lt;= x.i(n) MPY *c++ ; P &lt;= T* c.i(n) LT *x++ ; T &lt;= x.r(n) MPYS *c++ ; ACC &lt;= ACC - P,P &lt;= T* c.r(n) APAC lc-- ; ACC &lt;= ACC + P, decrement loop counter register BANZ ; Jump back to beginning of loop if lc is not zero YRSTART SA *y++ ; Store y.r ZAC ; ACC &lt;= 0, reset the pointers here. YISTART ; LT *x++ ; T &lt;= x.i(n) MPY *c++ ; P &lt;= T* c.r(n) LT *x++ ; T &lt;= x.r(n) MPYA *c++ ; ACC &lt;= ACC + P,P &lt;= T*c.i(n) APAC lc-- ; ACC &lt;= ACC + P BANZ YISTART SA *y ______________________________________
One limitation of the TMS320C2x DSP is its limited efficiency when performing complex number multiplication and FIR filters. As illustrated by the above pseudo code, the algorithm is basically serial in nature. Thus, it requires approximately 10 instructions to accumulate the result of multiplying together two complex numbers.
Multimedia applications (e.g., applications targeted at computer supported cooperation (CSC--the integration of teleconferencing with mixed media data manipulation), 2D/3D graphics, image processing, video compression/decompression, recognition algorithms and audio manipulation) require the manipulation of large amounts of data which may be represented in a small number of bits. For example, graphical data typically requires 16 bits and sound data typically requires 8 bits. Each of these multimedia application requires one or more algorithms, each requiring a number of operations. For example, an algorithm may require an add, compare and shift operations.
To improve efficiency of multimedia applications (as well as other applications that have the same characteristics), prior art processors provide packed data formats. A packed data format is one in which the bits typically used to represent a single value are broken into a number of fixed sized data elements, each of which represents a separate value. For example, a 64-bit register may be broken into two 32-bit elements, each of which represents a separate 32-bit value. In addition, these prior art processors provide instructions for separately manipulating each element in these packed data types in parallel. For example, a packed add instruction adds together corresponding data elements from a first packed data item and a second packed data item. Thus, if a multimedia algorithm requires a loop containing five operations that must be performed on a large number of data elements, it is desirable to pack the data and perform these operations in parallel using packed data instructions. In this manner, these processors can more efficiently process multimedia applications.
However, if the loop of operations contains an operation that cannot be performed by the processor on packed data (i.e., the processor lacks the appropriate instruction), the data will have to be unpacked to perform the operation. For example, if the multimedia algorithm requires an add operation and the previously described packed add instruction is not available, the programmer must unpack both the first packed data item and the second packed data item (i.e., separate the elements comprising both the first packed data item and the second packed data item), add the separated elements together individually, and then pack the results into a packed result for further packed processing. The processing time required to perform such packing and unpacking often negates the performance advantage for which packed data formats are provided. Therefore, it is desirable to incorporate in a computer system a set of packed data instructions that provide all the required operations for typical multimedia algorithms. However, due to the limited die area on today's general purpose microprocessors, the number of instructions which may be added is limited. Therefore, it is desirable to invent instructions that provide both versatility (i.e. instructions which may be used in a wide variety of multimedia algorithms) and the greatest performance advantage.