1. Field of the Invention
This invention relates generally to data processing, and more particularly to the processing of algorithms in software that benefit from efficient implementation of add-compare-select (ACS) butterfly operations.
2. Description of Related Technology
The computational burden of several important signal processing algorithms can be reduced by taking advantage of symmetry. One notable use of such symmetry in a signal processing application is the processing of “butterflies” in the Viterbi Algorithm (VA) used for the decoding of convolutionally encoded data. The Viterbi decoding algorithm is widely used in the area of data communications, such as for example in cellular telephones and modulator-demodulator devices (“modems”), where data must be recovered from a noisy channel. Convolutional decoding using algorithms such as the Viterbi Algorithms are normally performed on the aforementioned “butterfly” arrangement of old and new metric values. These algorithms are used to decode and estimate the most probable information sequence transmitted over a communications channel that has added noise to the signal. Because this decoding process is computationally intensive (and thereby consumes processor resources resulting in increased power consumption and heat generation), considerable effort has historically been expended on trying to make the necessary calculations as efficient as possible. For example, the hardware units described in U.S. Pat. No. 5,987,638 entitled “Apparatus and method for computing the result of a viterbi equation in a single cycle” and assigned to LSI Logic Corporation, ('638 patent) or the state metric memory arrangement disclosed in U.S. Pat. No. 4,979,175 entitled “State metric memory arrangement for a viterbi decoder” and assigned to Motorola Corp. ('175 patent), or the shift-left instruction disclosed in U.S. Pat. No. 5,742,621 entitled “Method for implementing an add-compare-select butterfly operation in a data processing system and instruction therefor” also assigned to Motorola Corp. ('621 patent), are generally intended to address such computational efficiency considerations.
Within certain signal processing algorithms such as the aforementioned Viterbi decoding algorithm, most processing cycles are used in the metric update routine. In most systems using convolutional encoding, the number of bits used to represent every input bit (known generally as the “rate”) and the encoding polynomials are carefully chosen to produce a symmetrical relationship between previous delay states, current delay states, and the path states between them. In the usual encoding case, two previous delay states, two current delay states, and the four possible paths between them can be drawn as a “butterfly” in a trellis diagram (see FIG. 1). In the decode process, butterflies can also be used to reduce the amount of data movement and calculation. The VA encode/decode can be accomplished using a general purpose processor (e.g., CISC) in software. Such software approach has the advantage of flexibility; specifically, it is possible to readily change the encode/decode operations by altering the software program. Such flexibility, however, comes at the cost of efficiency, since it takes many more cycles to implement the software approach than would be needed for a comparable hardware solution. Decoding in software also has the advantages afforded by the sophisticated compiler and debug tools currently available to the programmer.
Prior art Viterbi hardware-based accelerators such as that set forth in the '638 patent can perform a single cycle butterfly calculation, but suffer from the disadvantage that they do not provide a flexible solution that can be readily modified by software. Additionally, separate co-processor blocks utilized with such designs are inefficient in terms of silicon usage, as the logic employed within the co-processor is not usually re-used or adaptable for other functions. For example, the memory space of such a hardware solution used to store the old and new state metric values will be dedicated memory for the co-processor, thereby making such solutions non-optimized with respect to silicon usage.
In terms of software solutions, typical prior art processors with specialized metric update instructions in their instruction set architectures (ISAS) take more than one cycle for the update due to (i) the use of several memory pointers (typically three), and (ii) the need to read two variables from memory and write two variables to memory. Classic instruction set architectures work on one or two input operands and produce one result as output, and accordingly are not suited to a single instruction that can perform a complete butterfly calculation. In a typical prior art implementation 200 of storing the old and new metric data (FIG. 2), three data buffers are used: (i) the “old” metric buffer 202; (ii) the lower half “new” metric buffer 203; and (iii) the upper half new metric buffer 204. Each of these three buffers has a separate pointer associated therewith. FIG. 3 illustrates a typical prior art multi-cycle (e.g., four-cycle) butterfly metric update using three pointers, based on the buffer arrangement of FIG. 2. Prior to the butterfly calculation, local distances (LDs) are calculated per step 300; these LDs are loaded into registers associated with the Viterbi acceleration instructions. In the first cycle 302 of the update operation, two old metric values are read from the applicable buffer, and a pair of paths (i.e., Old_met (2*j)+LD, and Old_met (2*j+1)−LD) calculated. In the second cycle 304, the maximum of the two paths is identified and stored back in the lower new metric buffer, and a bit shifted into a path transition register. Pointer “B” is also post-incremented. In the third cycle 306 of FIG. 3, the remaining two paths (i.e., Old_met (2*j)−LD, and Old_met (2*j+1)+LD) are calculated using the values read in the first cycle 302. Pointer “A” is post-incremented. In the fourth cycle 308, the maximum of the two values calculated in the third cycle 306 is stored back to the upper new metric buffer, and another bit is shifted into the path transition register. Pointer “C” is post-incremented.
Based on the foregoing, there is a need to provide an improved configuration adapted to reduce the computation time (and particularly, the number of cycles used) for executing the VA decode in software. Such reduced computation time would provide decode efficiency comparable to hardware implementations, yet with the attendant benefits of flexibility and the availability of compiler and debug tools associated with software solutions. This improved configuration would also be readily implemented in existing processor instruction set architectures (ISAs) so as to minimize the changes necessary thereto. Furthermore, this improved configuration would ideally be adapted to utilize silicon-efficient hardware (including memory), thereby keeping the size of the processor to a minimum.