1. Field of the Invention
The invention relates to the field of computer systems. More specifically, the invention relates to operations on complex numbers.
2. Background Information
Many devices in use today (e.g., modems, radar, TV, telephone, etc.) transmit data using in phase and out of phase signals (e.g., orthogonal signals). This data is typically processed using complex numbers (e.g., the real number is used for the in phase signal, while the imaginary number is used for the out of phase signal). The multiplication of two complex number (e.g., r1 i1 and r2 i2) is performed according to Equation 1 shown below:
Real Component=r1xc2x7r2xe2x88x92i1xc2x7i2
Imaginary Component=r1xc2x7i2+r2xc2x7i1xe2x80x83xe2x80x83Equation 1
The multiplication of complex numbers is required in operations such as, the multiply-accumulate operation (see Equation 2 below). In Equation 2, a(n) and b(n) represent the nth complex numbers in two series of complex numbers:
y(n)=y(nxe2x88x921)+a(n)*b(n)xe2x80x83xe2x80x83Equation 2
Digital discrete time filters, such as a FIR filter and an IIR filter, require many multiply-accumulate operations. A FIR filter is an operation which is used in applications, such as real time digital signal processing applications (e.g., complex demodulation and equalization found in high speed data modems; ghost canceling in terrestrial broadcasting), for recovery of the transmitted information from the signal. The equation for the FIR filter is shown below as Equation 3:                               y          ⁡                      (            k            )                          =                              ∑                          n              =              0                                      L              -              1                                ⁢                                    c              ⁡                              (                n                )                                      *                          x              ⁡                              (                                  k                  -                  n                                )                                                                        Equation        ⁢                  xe2x80x83                ⁢        3            
With reference to Equation 3, the complex variable y(k) represents the current output sample of the filter, the input value c(n) represents the nth filter coefficient of the filter, the constant L is the number of coefficients in c(n), and the input value x(kxe2x88x92n) represents the nth past value of the input sequence (also termed as xe2x80x9csamplesxe2x80x9d). The output of the filter is a weighted average of the past L complex samples. Typically, there are more samples than there are coefficients. For the computation of the kth output sample y(k), the first complex coefficient corresponds to the kth sample, the second corresponds to the (kxe2x88x921)th sample, and so on. Each complex coefficient is multiplied by the sample to which it corresponds, and these products are accumulated to generate the kth output sample of the filter. For the computation of the (k+1)th output sample y(k+1), the first complex coefficient corresponds to the (k+1)th sample, the second complex coefficient corresponds to the kth sample, and so on. Each complex coefficient is multiplied by the sample to which it corresponds, and these products are accumulated to generate the (k+1)th output of the filter. Thus, the correspondence between the samples and the complex coefficients is slide up one for each successive output sample. As a result, FIR filters are typically coded using an outer and an inner loop. The outer loop steps through the successive outputs (the different corresponding relationships between the samples and complex coefficients), while the inner loop steps through the complex coefficients and current corresponding samples to perform the multiply-accumulate.
When a FIR filter is first begun, there are insufficient samples to compute the entire length (L) of the filter (i.e., index kxe2x88x92n into the input samples x() is negative). In such situations, the missing samples are typically substituted with zero, the first sample, or some other relevant input.
The equation for the IIR filter is shown below as Equation 4:                               y          ⁡                      (            k            )                          =                                            ∑                              n                =                0                                            L                -                1                                      ⁢                                          c                ⁡                                  (                  n                  )                                            *                              x                ⁡                                  (                                      k                    -                    n                                    )                                                              +                                    ∑                              i                =                0                                            M                -                1                                      ⁢                                          d                ⁡                                  (                  i                  )                                            *                              y                ⁡                                  (                                      k                    -                    i                                    )                                                                                        Equation        ⁢                  xe2x80x83                ⁢        4            
With reference to Equation 4, the input value d(i) represents the ith filter coefficient of the filter, and the constant M is the number of coefficients in d(i).
One prior art technique for supporting multiply-accumulate operations is to couple a separate digital signaling processor (DSP) to an existing general purpose processor (e.g., The Intel(copyright) 486 manufactured by Intel Corporation of Santa Clara, Calif. The general purpose processor allocates jobs to the DSP.
One such prior art DSP is the TMS320C2x DSP manufactured by Texas Instruments, Inc. of Dallas, Tex. A prior art method for performing a complex multiply-accumulate operation on this DSP is to perform the multiply and add operations to generate the real component and add that real component to an accumulation value representing the accumulated real component, and then perform the multiply and add operations to generate the imaginary component and add that imaginary component to an accumulation value representing the accumulated imaginary component. A pseudo code representation of the inner loop of the FIR filter is shown below in Table 1.
One limitation of the TMS320C2x DSP is its limited efficiency when performing complex number multiplication and FIR filters. As illustrated by the above pseudo code, the algorithm is basically serial in nature. Thus, it requires approximately 10 instructions to accumulate the result of multiplying together two complex numbers.
Multimedia applications (e.g., applications targeted at computer supported cooperation (CSCxe2x80x94the integration of teleconferencing with mixed media data manipulation), 2D/3D graphics, image processing, video compression/decompression, recognition algorithms and audio manipulation) require the manipulation of large amounts of data which may be represented in a small number of bits. For example, graphical data typically requires 16 bits and sound data typically requires 8 bits. Each of these multimedia application requires one or more algorithms, each requiring a number of operations. For example, an algorithm may require an add, compare and shift operations.
To improve efficiency of multimedia applications (as well as other applications that have the same characteristics), prior art processors provide packed data formats. A packed data format is one in which the bits typically used to represent a single value are broken into a number of fixed sized data elements, each of which represents a separate value. For example, a 64-bit register may be broken into two 32-bit elements, each of which represents a separate 32-bit value. In addition, these prior art processors provide instructions for separately manipulating each element in these packed data types in parallel. For example, a packed add instruction adds together corresponding data elements from a first packed data item and a second packed data item. Thus, if a multimedia algorithm requires a loop containing five operations that must be performed on a large number of data elements, it is desirable to pack the data and perform these operations in parallel using packed data instructions. In this manner, these processors can more efficiently process multimedia applications.
However, if the loop of operations contains an operation that cannot be performed by the processor on packed data (i.e., the processor lacks the appropriate instruction), the data will have to be unpacked to perform the operation. For example, if the multimedia algorithm requires an add operation and the previously described packed add instruction is not available, the programmer must unpack both the first packed data item and the second packed data item (i.e., separate the elements comprising both the first packed data item and the second packed data item), add the separated elements together individually, and then pack the results into a packed result for further packed processing. The processing time required to perform such packing and unpacking often negates the performance advantage for which packed data formats are provided. Therefore, it is desirable to incorporate in a computer system a set of packed data instructions that provide all the required operations for typical multimedia algorithms. However, due to the limited die area on today""s general purpose microprocessors, the number of instructions which may be added is limited. Therefore, it is desirable to invent instructions that provide both versatility (i.e. instructions which may be used in a wide variety of multimedia algorithms) and the greatest performance advantage.
The invention provides a method and apparatus for performing complex digital filters is described. According to one aspect of the invention, a method for performing a complex digital filter is described. The complex digital filter is performed using a set of data samples and a set of complex coefficients. In addition, the complex digital filter is performed using a inner and outer loop. The outer loop steps through a number of corresponding relationships between the set of complex coefficients and the set of data samples. Each of these corresponding relationships is used by the digital filter to generate an output which is stored in the form of a packed data item. Each output packed data item has a first and second data element respectively storing the real and imaginary components of the filter""s complex output. The inner loop steps thorough each complex coefficient in the set of complex coefficients. Within the inner loop, the data sample corresponding to the current complex coefficient (the complex coefficient currently identified by the inner loop) is determined according to the current corresponding relationship (the corresponding relationship currently identified by the outer loop). Then, in response to receiving an instruction, eight data elements are read and used to generate a currently calculated complex number. These eight data elements were previously stored as packed data and include two representations of each of the components of the current complex coefficient and its current corresponding data sample. Each of these data elements is either the positive or negative of the component they represent. As a result of the manner in which these eight data elements are stored, the currently calculated complex number represents the product of the current complex coefficient and its current corresponding data sample. The currently calculated complex number is then added to the current output packed data. As a result, the current output packed data stores the sum of the complex numbers generated in the current inner loop. According to another aspect of the invention, a machine-readable medium is described. This machine-readable medium has stored thereon data representing sequences of instructions which, when executed by a processor, cause that processor to perform the above described method.
According to another aspect of the invention, a method for updating complex coefficients used in a digital filter is described. This updating is performed using a set of complex data, a set of complex coefficients, an error distance, and a rate of convergence. A loop is implemented to step thorough each complex coefficient in the set of complex coefficients. Within the loop, the complex data sample corresponding to the current complex coefficient (the complex coefficient currently identified by the inner loop) is determined. In addition, a instruction is executed that causes eight data elements to be read and used to generate a currently calculated complex number. These eight data elements were previously stored as packed data and include two representations of each of the components of the error distance and the current corresponding complex data sample. Each of these data elements is either the positive or negative of the component they represent. As a result of the manner in which these eight data elements are stored, the currently calculated complex number represents the product of the error distance and the complex conjugate of the current corresponding data sample. The real and imaginary components of the currently calculated complex number are then shifted right by the rate of convergence to generate a current complex factor. The real and imaginary components of this current complex factor are subtracted from the respective real and imaginary components of the current complex coefficient to generate the updated components of the current complex coefficient.