In an effort to achieve the demanding speed requirements of such data processing applications as high-speed digital communication and voice processing systems, use has been made of the digital signal processor (DSP), which is a special-purpose CPU utilized for digital processing and analysis of signals from analog sources, such as sound. The analog signals are converted into digital data and analyzed using various algorithms, such as Fast Fourier Transforms. DSPs are designed for particularly fast performance of certain operations, such as multiplication, multiplying and accumulating, and shifting and accumulating, because the mathintensive processing applications for DSPs rely heavily on such operations. For this reason, a DSP will typically include special hardware circuits to perform multiplication, accumulation and shifting operations.
One form of DSP architecture that exhibits significant benefits in processing speed is known as a Multiply-Accumulate or MAC processor. The MAC processor implements an architecture that takes advantage of the fact that the most common data processing operations involve multiplying two values, then adding the resulting value to another and accumulating the result. These basic operations are efficiently carried out utilizing specially configured, high-speed multipliers and accumulators, hence the "Multiply-Accumulate" nomenclature.
Another method for increasing processing speed is to perform different processes concurrently. Towards this end, DSP architectures with plural MAC structures have been developed. For example, a dual MAC processor is capable of performing two independent MAC operations concurrently. A simplified block diagram of a typical dual MAC processor 10 is illustrated in FIG. 1. Each half of the processor 12 has a 2-input multiplier 14 which receives input from an x or y (input) register 13 and stores its output in a product register 16. The product register is connected to one input of an adder 18, the output of which may selectively be stored in one of several accumulator registers 20. A second input of the adder 18 is connected to the accumulator array 20 to allow for a continuous series of cumulative operations. Additional data control signals (not shown) may allow the registers 13 to bypass the multiplier be connected directly to the inputs of adders 18. Conventional vector processors may have one or several MAC processors operating in parallel.
The DSP16000 dual-MAC processor, available from Lucent Technologies, includes a data arithmetic unit (DAU), which constitutes the primary computational unit. The inputs to the multipliers of the DAU are applied through a pair of double length registers designated as the x and y registers, while the output of each multiplier is applied to a respective product register. Concurrent accumulations are achieved by providing both two-input arithmetic logic unit (ALU) and a three-input adder, either of which may accumulate the data in either product register. When mathematical functions are performed by the ALU or adder, the result is stored in an accumulator register, a number of which are present in the DAU.
In wireless and wireline applications, particularly those with significant intersymbol interference, DSP's are used to perform data error detection and correction using convolutional encoding and Viterbi Decoding. Convolutional encoding is performed by convolving a data input bit with one or more previous uncoded input bits. The convolved data is decoded using the well known Viterbi algorithm. The Viterbi algorithm uses knowledge about the possible state transitions of the encoder from one given state to the next to determine the most likely encoder input given the received data.
FIG. 2 is an illustration of the basic Viterbi algorithm butterfly computation. Four possible encoder transitions from present state (PS) to next state (NS) are illustrated. The present state is equivalent to the numeric value of the data stored in a shift register of the encoder. When a bit is input, the encoder register is shifted to the right and the input bit is moved into the most significant bit position (shown in bold in the next state). Thus, as illustrated, NS.sub.0 can be reached with a 0 input bit from either PS.sub.0 or PS.sub.1. Similarly, NS.sub.8 can be reached with a 1 input bit from either PS.sub.0 or PS.sub.1. The Viterbi algorithm provides a way to determine which of the two possible transition paths is the most likely, e.g., which is the survivor path.
This determination consists of two basic steps. The first step is a branch metric computation which determines the Euclidean distance between the received data symbol and the actual data symbol which would result from a state transition from the present to a next state. The branch metric for a transition from a present state i to a next state j at instant k is signified as m.sub.i,j (k) and is represented by the equation: ##EQU1## where x.sub.n (k) is the received nth symbol, C.sub.n,ij is the actual symbol that would result from state transition of i to j (which is determined from the structure of the convolutional encoder), and the rate of the encoder (e.g., the number of output bits for every input bit) is 1/R. For a rate 1/R encoder, two branch metrics must be computed for each next state.
Once the branch metric for all possible state transitions is calculated, the accumulated distance is calculated for each input path and the path with the minimum distance (i.e., maximum probability) is selected as the survivor path. This step is known as Add-Compare-Select, or ACS. The third step is known as traceback. This step traces the maximum likelihood path through a trellis of possible present state to next state transitions, as determined by the first two steps, and reconstructs the path through the trellis to extract the original input data. In this example, the survivor path is represented by the least significant bit of the present state, conventionally referred to as the traceback bit (shown in bold in FIG. 2). For example, if the path from present state S.sub.1 is chosen over the path from present state So, the traceback bit is 1.
The ACS operation can be broken into two steps: (1) the Add operation, or path metric computation, and (2) the Compare-Select operation. The path metric add operation is the accumulation of the present state cost (a value initialized by the user at the start of the Viterbi processing) and the branch metric values. As shown in FIG. 2, the two path metrics for next state 0000 are: EQU PS.sub.0 +m.sub.0,0 and PS.sub.1 +m.sub.1,0 (Equ. 1)
and for next state 1000 are: EQU PS.sub.0 +m.sub.0,8 and PS.sub.1 +m.sub.1,8 (Equ. 2)
Once calculation of the two path metrics for each state is completed, the values are compared and the minimum or the maximum, depending on implementation details, is selected as the survivor cost and the corresponding traceback bit (TB) is determined and stored. This operation for the path metrics of Equs. 1 and 2 can be expressed, for example, as: EQU NS.sub.0 =min(PS.sub.0 +m.sub.0,0,PS.sub.1 +m.sub.1,0) (Equ. 3) EQU TB.sub.0 =0 if NS.sub.0 =PS.sub.0 +m.sub.0,0 else TB.sub.0 =1(Equ. 4)
and EQU NS.sub.8 =min(PS.sub.0 +m.sub.0,8,PS.sub.1 +m.sub.1,8) (Equ. 5) EQU TB.sub.8 =0 if NS.sub.8 =PS.sub.0 +m.sub.0,8 else TB.sub.8 =1(Equ. 6)
The above equations represent the analysis for a general decoder. For the more specific class of decoders having the property that the metric m.sub.0,8 =-m.sub.0,0, equations 2 can be expressed as: EQU PS.sub.0 -m.sub.0,0 and PS.sub.1 -m.sub.1,0 (Equ. 2a)
and Equations 5 and 6 can be expressed as: EQU NS.sub.8 =min(PS.sub.0 -m.sub.0,0,PS.sub.1 -m.sub.1,0) (Equ. 5a) EQU TB.sub.8 =0 if NS.sub.8 =PS.sub.0 -m.sub.0,0 else TB.sub.8 =1(Equ. 6a)
Although dedicated hardware Viterbi decoders constitute efficient and successful strategies for data detection, it is useful to implement a Viterbi algorithm using a signal and data processor which can be programmed for other applications as well. One form of architecture which has been used for this purpose is the MAC processor, discussed above.
Attempts have been made to optimize MAC processors to optimize speed of execution of the Viterbi ACS operations. For example, the TMS320C5xx single-MAC DSP from Texas Instruments provides an instruction which allows either Equ. 1 or Equ. 2 to be evaluated in one cycle by using a split mode 16-bit add/subtract operation. However, one-cycle performance can only be achieved for an encoder configured so that branch metric m.sub.1,0 =-m.sub.0,0 and thus this performance cannot be achieved for when the encoder does not have this property. The TMS320C5xx chip also provides a single cycle instruction to perform Equs. 3 and 4 concurrently or Equs. 5 and 6 concurrently using only a "maximum" criteria, and thus cannot easily perform a Viterbi algorithm implemented to require the minimum of the path metric values because the generated traceback bits are altered. Further, the compare and select operations are implemented using a dedicated comparator unit which is separate from the primary adder or arithmetic logic unit.
Texas Instruments also provides a dual-MAC DSP, part number TMS320C6xx, which can evaluate Equs. 1 and 2 in a single cycle. However, this chip does not contain the necessary hardware to perform Equs. 3, 4, 5, and 6 in a single cycle because the traceback bit is not automatically generated and stored in a traceback register but instead must be explicitly shifted into an appropriate register using an additional command. Thus, additional machine cycles are required to store a traceback bit based on the results of the comparison, reducing the efficiency of Viterbi decoding.
A dual-MAC processor of the present invention comprises a pair of adder units and/or arithmetic logic units (ALU) operating in parallel and connected to a common accumulator register bank. The processor is optimized so that two Viterbi ACS operations, including traceback bit storage, can be executed in two machine cycles. Each adder/ALU comprises means to add, subtract, and compare one pair of data inputs when a full mode operation is performed or two pairs of data input when a split mode operation is performed. According to the invention, compare operations are executed using the subtract function of the adder/ALU and the sign bit is combined with a compare mode bit to generate a traceback output which indicates the proper traceback bit to store during the compare portion of Viterbi convolutional decoding. Each traceback output is connected to the input of a traceback shift register. When a compare operation is performed and a Viterbi mode bit is active, the generated traceback output is shifted into the traceback register. Each adder/ALU is configured with a subset of full and split-mode functions optimized to perform efficiently Viterbi add-compare- select.