The invention relates to digital signal processor architectures and processing methods, and more particularly concerns a method for reducing the number of processor cycles required to perform certain operations in a digital signal processor of the type which can perform concurrent, plural operations.
A digital signal processor (DSP) is a special-purpose CPU utilized for digital processing and analysis of signals from analogue sources, such as sound. The analog signals are converted into digital data and analyzed using various algorithms, such as Fast Fourier Transforms. DSPs are designed for particularly fast performance of certain operations, such as multiplication, multiplying the accumulating, and shifting and accumulating, because the math-intensive processing applications for DSPs rely heavily on such operations. For this reason, a DSP will typically include special hardware circuits to perform multiplication, accumulation and shifting operations.
DSP designers strive to implement architectures capable of processing software instructions at ever increasing rates. In as much as performing an instruction takes a certain number of clock cycles, eliminating unnecessary instructions substantially increases a DSPs"" performance. One common performance or efficiency benchmark utilized in the industry takes into account the number of MIPs, or millions of instructions per second, that the DSP is capable of carrying out. The greater the number, the more efficient the architecture is considered.
Designers are therefore constantly striving to increase the number of MIPs provided by their processors. The most common way of achieving this is to increase the speed of the processor. Any operation performed by the processor will require a given number of operating cycles of the processor, typically one or two. When the speed of the processor is increased, the duration of a cycle is reduced, and a corresponding increase in the number of MIPs can be expected.
One form of DSP architecture that exhibits significantly large MIPs is known as a Multiply-Accumulate or MAC processor. The MAC processor implements an architecture that takes advantage of the fact that the most common data processing operations involve multiplying two values, then adding the resulting value to another and accumulating the result. These basic operations are efficiently carried out utilizing specially configured, high-speed multipliers and accumulators, hence the xe2x80x9cMultiply-Accumulatexe2x80x9d nomenclature.
A second method for increasing the number of MIPs is to perform different processes concurrently. Towards this end, DSP architectures with plural MAC structures have been developed. For example, a dual MAC processor is capable of performing two independent MAC operations concurrently.
As one of the primary building blocks of the MAC architecture, accumulators comprise registers that act as sources or destinations of information for arithmetic units. Typically, in a MAC architecture, accumulators either receive data the arithmetic unit loaded from memory or transmit data from the arithmetic unit for storage to a memory. In other words, the accumulators hold input data as well as the output results of mathematical computations such as accumulations and shifts.
An accumulator is typically designed to include separately addressable high and low parts and often a number of guard bits (for overflow, etc.). Loading of bits into the accumulator from a memory normally occurs in the high part of the accumulator. A typical accumulator might be 40 bits in length and include 16 bits for the high and low parts, and an 8 bit section for the guard. Thus, in a conventional DSP having several accumulators to load and/or store results, a separate instruction would be required to load or store each accumulation or part. Since data transfers to or from accumulations are an extremely frequent occurrence, any acceleration of such transfers or reduction in their frequency of occurrence would provide more efficient processing, and substantially improve the number of MIPs. Where the number of accumulators becomes quite large, the instruction requirement becomes problematic when designing for efficient processing or maximizing MIPs.
One example of the problem of instruction heavy accumulator processing relates to infinite-impulse response filters. These filters generally require a substantial number of multiplication and accumulation operations to iteratively filter out unnecessary data. Consequently, because of the relatively large number of accumulators accesses required to effect filtering in a conventional processor, the number of corresponding instructions is relatively large.
Therefore, the need exists for a DSP method and architecture having the capability of loading and storing data to and from accumulators with fewer accesses. The method and architecture of the present invention satisfy this need.
The method and architecture of the present invention provide the advantages of to high speed and efficient DSP performance while substantially reducing the number of instructions necessary to effect accumulator processing in a dual MAC processor.
To realize the advantages above, the present invention, in one aspect, comprises a method for substantially reducing the number of instruction cycles in a digital signal processor which are dedicated to accumulator data transfers. The processor includes a multiplier unit, an adder, memory, and a plurality of accumulators. The accumulators include respective high and low parts, each of which contains a sufficient number of bits to be of or input value to the multiplex. The method includes the steps of concatenating or xe2x80x9cvectoringxe2x80x9d the parts from accumulators to define a single vectored register responsive to a single instruction cycle and then processing the data in the vector register. Thus, the two different register parts are processed in a single processor cycle, instead of two.
In another aspect, the invention comprises a vectored register for use in a digital signal processor of the type including a plurality of accumulators, a multiplier unit, an adder, and memory. The accumulators include respective high and low parts with the vectored register comprising a first part from a first of the accumulators and a second part from a second of the accumulators. The first and second parts cooperate to define a vector subjected to processing as if it were a single register.
In yet another aspect, the invention comprises a register array for use in a digital signal processor employing a dual Multiply-Accumulate architecture. The register array includes a plurality of accumulators including respective high and low parts. The parts of different accumulators are combined to define a vectored data structure. A multiplier unit is operatively in data communication with the plurality of accumulators and data is selectively transferred between the accumulators and the vectored data structure. A memory is connected to the plurality of accumulators to alternatively store data loaded from or load data to the vector. The vectored data structure effects the loading or storing operations during a single instruction cycle.
In a further aspect, the invention comprises a data arithmetic unit for use in a digital signal processor. The unit includes a data computation module for calculating values from respective sets of data and a data transfer path for directing the input and output of the calculated values. The data computation module comprises a multiplier, respective output product registers, and a plurality of accumulators including respective high and low parts. Parts of accumulators are concatenated to define a vector. The multiplier is coupled to receive data loaded from the vector, while memory is connected to the plurality of accumulators to store data from and load data to the vector. During operation, the vector is processed in the manner of a register, so that the two parts comprising the vector may be processed in a single instruction cycle to effect efficient loading or storing operations.