This application claims priority to S.N. 98402465.3, filed in Europe on Oct. 6, 1998 (TI-27679EU) and S.N. 98402455.4, filed in Europe on Oct. 6, 1998 (TI-28433EU).
The present invention relates to the computation of linear vectors in a digital processing system.
In particular, the invention relates to the computation of any vector {right arrow over (Y)} defined as a linear combination of N vectors ({right arrow over (X)}i)1xe2x89xa6ixe2x89xa6N with N coefficients                     (                  C          i                )                    1        ≤        i        ≤        N              ∈                  {                              -            1                    ;                      +            1                          }            ⁢              :            ⁢              xe2x80x83            ⁢              Y        →              =            ∑              1        ≤        i        ≤        N              ⁢          xe2x80x83        ⁢                  C        i            *                                    X            →                    i                .            
Some signal processing applications, for example processing operations for a GSM (Global System for Mobiles) half rate vocoder need effective algorithmic dependent processing on (Ci)1xe2x89xa6ixe2x89xa6N coefficients. Examples of such processing operations are:
circular permutations of the coefficients; and
complementing the value of a coefficient.
A typical digital signal processor (DSP) implementation will be effected as follows:
Firstly, the (Ci)1xe2x89xa6ixe2x89xa6N coefficients are converted from a bit representation (0, 1) to fractional numbers (xc2xd, xe2x88x92xc2xd) coded on multiple bits (16, 32, . . . ). Then each {right arrow over (Y)} vector coordinate (Yj)1xe2x89xa6jxe2x89xa6M is computed with an N step algorithm. The computation is effected with a Multiply and Accumulate unit, as represented by the following equation:             ∀              1        ≤        j        ≤                  M          ⁢                      xe2x80x83                    ⁢                      Y            j                                =                  ∑                  1          ≤          i          ≤          N                    ⁢              xe2x80x83            ⁢              (                              (                                          C                i                            *                              X                ij                                      )                    ⁢                       less than  less than             1                          )              ,
where (Xij)1xe2x89xa6jxe2x89xa6M are the coordinates of the vector {right arrow over (X)}i 
It will be noted that two memory operands (Ci, Xij) and a multiplication operation are required for each step of the computation. The coefficient addressing is carried out using an address generation unit, with indirect memory addressing, for example circular post-modification addressing, as required.
The disadvantages of this known approach are that a large number of memory addressing operations and operand fetching operations are needed. This reduces the availability of the busses, and demands a high power consumption to drive all of the bus operations. The use of a multiplication operation also requires high power consumption.
In modern processing engine design, it is desirable to reduce power consumption, both for ecological and economic grounds. Particularly, but not exclusively, in mobile processing applications, for example mobile telecommunications applications, it is desirable to keep power consumption as low as possible without sacrificing performance more than is necessary.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Combinations of features from the dependent claims may be combined with features of the independent claims as appropriate and not merely as explicitly set out in the claims.
In accordance with a first aspect of the invention, there is provided a processing engine for computing an output vector as a linear combination of N input vectors with N coefficients. The processing engine comprises a coefficient register for holding a representation N coefficients for a first input vector. A test unit is provided for testing selected parts of the coefficient register for respective coefficient representations. An arithmetic unit is provided for computing respective coordinates of an output vector by selective addition and/or subtraction of input vector coordinates for a second input vector dependent on results of the tests on the coefficient representations.
An embodiment of the invention enables the achievement of a number of advantages with respect to the prior art as regards the computation of a combination of linear vectors.
Power consumption can be reduced with respect to a processing engine operating in accordance with the prior technique as described in the introduction, as a test instruction (e.g. a bit test instruction) in combination with an ALU instruction is used rather than a MAC instruction. Also, only one data memory access per step of the computation is required, rather than two data memory accesses.
Execution performance is broadly equivalent to that of a processing engine operating in accordance with the prior technique described in the introduction as there are no processor stalls due to memory resource access conflicts. Also a minimal overhead of one cycle can, if required, be hidden by providing for parallel execution.
Flexible addressing is provided in that an address generation unit can be used to generate memory addresses and bit addresses in CPU registers.
The hardware overhead is minimal.
In an embodiment of the invention:
(Ci)1xe2x89xa6ixe2x89xa6N coefficients, represented as bits (0, 1), for a first input vector are packed in a CPU register.
Each coordinate of an output vector {right arrow over (Y)} is computed with a N+1 step algorithm.
The computation is done with bit test unit operating in parallel with an ALU unit:       ∀          1      ≤      j      ≤              M        ⁢                  xe2x80x83                ⁢                  Y          j                      =            ∑              1        ≤        i        ≤        N              ⁢          xe2x80x83        ⁢          (                        -                                    (                              -                1                            )                                      C              i                                      *                  X          ij                    )      
At step (i+1)1xe2x89xa6ixe2x89xa6N of the computation:
bit Ci+1 of the CPU register is addressed;
the tested bit is stored in a temporary register;
a conditional addition/subtraction of a coordinate of the second input vector Xij is performed based on the tested bit.
A test status register can be provided for storage of a coefficient test result prior to selective addition/subtraction of a coordinate of the second input vector dependent on the coefficient test result.
An address generator can be provided to generate a register bit pointer address for selecting a part (e.g., a bit position or a bit field) of the coefficient register holding an appropriate coefficient representation (e.g., one or more bits) for testing by the test unit. The same or a different address generator can also generate a memory address for retrieving a coordinate for the second input vector. Where the same data address generation hardware is used, this results in economical use of silicon and also of economies in power consumption in use. The coefficient register can include more than N bits. Circular bit addressing with post modification can be used to provide efficient use of the register, modifying the coefficients in a wrap round, or modulo manner.
A memory operand is fetched and a coefficient representation (e.g., a bit) is tested for each step of the computation. The test unit can be a bit test unit.
In an example of the invention, the bit test unit is operable in a step i+1 of a computation of a coordinate {right arrow over (Y)}j of a vector {right arrow over (Y)} to test a coefficient Ci+1, and an arithmetic unit is operable in parallel therewith to perform a conditional addition/subtraction of an operand Xij dependent upon the result of a test on a bit Ci of the coefficient register performed at a step i of the computation.
The computation of an output vector coordinate can thus be performed as a series of N+1 steps.
The processing engine can be in the form of a digital signal processor and can be integrated in an integrated circuit.
The invention also provides a telecommunications device comprising a data input device, a display, an antenna and a processing engine as defined above.
In accordance with another aspect of the invention, there is provided a method of computing an output vector as a linear combination of input vectors with N coefficients in a processing engine. The method comprises:
holding a representation of each of N coefficients of a first input vector in a coefficient register;
selectively testing respective coefficient representations of the first input vector; and
computing coordinates of an output vector by selective addition and/or subtraction of coordinates of a second vector dependent on the results of testing the coefficient representations for the first input vector.
In an embodiment of the invention, the computation of a coordinate of an output vector {right arrow over (Y)} is performed as a series of N+1 steps, wherein, at a step (i+1)1xe2x89xa6ixe2x89xa6N of the computation:
a bit Ci+1, representative of a coefficient of the first input vector, is addressed in the coefficient register for testing that bit;
a tested bit is stored in a bit test result register; and
one of a conditional addition and subtraction of an Xij coordinate of a second input vector is performed dependent on a bit stored in the bit test register as a result of a test on a bit Ci of the coefficient register performed at a step i of the computation.
It will be noted that only one memory operand (Xij) is required for each step of the computation. The coefficient addressing can be done through an address generation unit, with any indirect register bit addressing being used (for example circular post-modification).