Implementation of mobile radio based on processors or digital signal processors (DSP) demands computationally intensive receiver processing techniques and a high level of programmability. There may be a need to cover a wide range of data rates to support a family of standards such as Global System for Mobile Communications (GSM), General Packet Radio Services (GPRS), Enhanced-GPRS (EGPRS), and Wide-Band Code Division Multiple Access (W-CDMA), thus impacting the performance requirements of wireless devices (e.g., handsets). Further, specific implementation of services such as Adaptive Multi-Rate (AMR) in GSM handset radio severely impacts a DSP power dissipation budget requirement. Thus, reducing power consumption and increasing performance are general design goals for wireless handset devices.
Channel encoding and decoding functional blocks are prominent among the various functional blocks that demand a high level of DSP performance and power consumption in a GSM radio device. The channel encoder adds redundancy to the transmitted data while the receiver decoder may use this data to repair any corrupted underlying information data.
To meet the performance requirements and power constraints of a processor, computationally intensive channel decoding techniques should be implemented efficiently.
A typical implementation of a Viterbi decoder generally consists of two stages. An add-compare-select (ACS) stage and a traceback stage. Though the ACS stage may be computationally intensive, techniques exist to speed up its operation by employing multiple execution units in parallel. On the other hand, due to its sequential in nature, the traceback stage cannot easily be parallelized, and may consume an appreciable fraction of the DSP or hardware cycles.
Presently, a typical DSP implementation of the traceback operation for a 16-state decoder, takes five DSP cycles per decoded bit, to perform the required shift, index addressing, and memory accesses. The DSP cycles per decoded bit increase substantially with larger constraint lengths. For example, a 64-state decoder (e.g., constraint length “K”=7) may require nine DSP cycles per decoded bit, to move the 64 bits of the traceback vector from an on-chip memory to the appropriate registers, and then to search for the traceback bit within the 64 bits. Similarly, a 256-state decoder (e.g., K=8) may require 12 DSP cycles per decoded bit.
Although conventional Viterbi hardware accelerators perform the traceback computation in one cycle per decoded bit, they require a substantial amount of hardware support for large constraint lengths, resulting in increased memory accesses and power consumption. For example, a typical 256-state decoder with an 8-bit state requires a complex 256:1 multiplexer tree and a multi-cycle loading of a 256-bit traceback vector register.