1. Field of the Invention
The present invention relates generally to the field of data processors, and specifically to an improved data processor and related methods for processing communications data such as, for example, Viterbi decoding.
2. Description of Related Technology
The need for increased efficiency and speed in communications data processing is now ubiquitous. Consumer and other communications applications demand increased performance in a smaller form factor and with less power consumption. This is especially true in consumer wireless handsets, where it is desired to have the most rapid data encoding and decoding possible within the smallest and most power efficient IC, thereby reducing handset size and increasing battery longevity.
Such encoding and decoding processes can be quite complex. The well known Viterbi algorithm is an example of a decoding algorithm used for convolution codes in a memory-less noisy channel. The Viterbi algorithm attempts to estimate the state sequence of the encoder finite state machine (FSM) from the corrupted received data. Since these complex algorithms are run in effect continuously during the communication process, even small gains in efficiency and performance on a per-operation or per-cycle basis can produce large benefits in efficiency and power consumption.
An idealized Viterbi channel encoder/decoder system is shown in FIG. 1. The encoder (FIG. 2) produces a code symbol consisting of two binary bits for every input bit. The code rate (r=k/n) is 1/2, where k=1 is the input rate and n=2 is the output rate. The number of bits that have an effect upon the output is 3. This parameter is known as the constraint length. The encoder is assumed to be a Mealy type FSM of the kind well known in the art, and so the outputs produced are a function of the current state and the current input.
The encoder's outputs and state transitions can be best visualised with the aid of a state transition diagram, as shown in FIG. 3. The dashed lines 302 represent an input of ‘0’, and vice versa for the solid lines.
An extension of the state diagram is known as a trellis diagram. The trellis displays all the information in a state diagram, and also includes transition in time. The trellis diagram shown in FIG. 4 is for an encoder with code rate=1/2 and constraint length 3.
The Viterbi encoder will produce a unique set of state transitions for the information bits supplied as shown in FIG. 5. The sequence supplied to this example encoder is 111102 and the state sequence is [S0, S2, S1, S2, S3, S1]. The decoder attempts to determine the FSM's state sequence by finding the path (though the trellis of FIG. 4) that maximise the probability of state sequence the FSM has passed though, given the received data.
As each code symbol is received, it is supplied to all states in a stage (a stage is time slot within the trellis). As can be seen in FIG. 5, each state has two branches leading into it from two separate states. Each state expects a known code symbol to be associated with that branch. Each branch is a terminator for a path though the trellis and each path have an accumulated error metric associated with it.
The code symbols are received by each state (or ACS node). The ACS calculates the branch metric error for each of the branch's expected code symbols and the received code symbols. The branch metric is added to the accumulated error metric for that path and the survivor branch is selected. The survivor is the branch with the lowest total accumulated error. The decisions for each state are stored in the traceback memory. The decision bit stored indicates which branch survived, ‘0’ for upper and ‘1’ for lower.
Traceback can begin after constraint-length×5 code symbols have been processed by the ACS node network. Traceback begins by finding the optimum starting state. The optimum state for hard-decision detection is the state with the smallest total accumulated error. Starting in the optimum state (OP) the next state to be traced-back into is calculated by using the decision bit stored in the OP and from a look-up table of predecessor states for that state. Referring back to FIGS. 2-5, it can be seen that if the OP was state 1 then a decision bit of ‘0’ would lead to state 2. The traceback is continued until the start of the traceback memory. Any code symbols decoded after K*5 can be outputted. The process of the ACS nodes providing decision bits and the traceback memory decoding the output is continued until no more code symbols are available.
A variety of different techniques are known in the prior art for implementing complex algorithms using data processors. These techniques generally fall into one of three categories: (i) “fixed” hardware; (ii) software; and (iii) user-configurable.
So-called ‘fixed’ architecture processors of the prior art characteristically incorporate special instructions and or hardware to accelerate particular functions. Because the architecture of processors in such cases is largely fixed beforehand, and the details of the end application unknown to the processor designer, the specialized instructions added to accelerate operations are not optimized in terms of performance. Furthermore, hardware implementations such as those present in prior art processors are inflexible, and the logic is typically not used by the device for other “general purpose” computing when not being actively used for coding, thereby making the processor larger in terms of die size, gate count, and power consumption, than it needs to be. Furthermore, no ability to subsequently add extensions to the instruction set architectures (ISAs) of such ‘fixed’ approaches exists.
Alternatively, software-based implementations have the advantage of flexibility; specifically, it is possible to change the functional operations by simply altering the software program. Decoding in software also has the advantages afforded by the sophisticated compiler and debug tools available to the programmer. Such flexibility and availability of tools, however, comes at the cost of efficiency (e.g., cycle count), since it generally takes many more cycles to implement the software approach than would be needed for a comparable hardware solution.
So-called “user-configurable” extensible data processors, such as for example the ARCtangent™ processor produced by the Assignee hereof, allow the user to customize the processor configuration, so as to optimize one or more attributes of the resulting design. When employing a user-configurable and extensible data processor, the end application is known at the time of design/synthesis, and the user configuring the processor can produce the desired level of functionality and attributes. The user can also configure the processor appropriately so that only the hardware resources required to perform the function are included, resulting in an architecture that is significantly more silicon (and power) efficient than fixed architecture processors. Such configuration can include, for example, the addition of specialized extension instructions (extensions), selection of memory and cache configurations, register sets, ALU configurations, and the like.
The ARCtangent processor is a user-customizable 32-bit RISC core for ASIC, system-on-chip (SoC), and FPGA integration. It is synthesizable, configurable, and extendable, thus allowing developers to modify and extend the architecture to better suit specific applications. It comprises a 32-bit RISC architecture with a four-stage execution pipeline. The instruction set, register file, condition codes, caches, buses, and other architectural features are user-configurable and extendable. It has a 32×32-bit core register file, which can be doubled if required by the application. Additionally, it is possible to use large number of auxiliary registers (up to 2E32). The functional elements of the core of this processor include the arithmetic logic unit (ALU), register file (e.g., 32×32), program counter (PC), instruction fetch (i-fetch) interface logic, as well as various stage latches.
A variety of different approaches to Viterbi decode using digital processors have been put forth in the prior art, the following being exemplary.
United States Patent Application 20020031195A1 to Honary published Mar. 14, 2002 and entitled “Method and apparatus for constellation decoder” discloses a method and apparatus for performing a slicer and Viterbi decoding operations which are optimized for single-instruction/multiple-data (SIMD) type of parallel processor architectures. Some non-regular operations are eliminated and replaced with very regular repeatable tasks that can be efficiently parallelized. A first aspect of the invention provides a pre-slicer scheme where once eight input symbols for a Viterbi decoder are ascertained and their distances calculated, these distances are saved in an array. A second aspect of the invention provides a way of performing the path and branch metric calculations in parallel to minimize processor cycles. A third aspect of the invention provides a method to implement the Viterbi decoder without continually performing a trace back. Instead, the previous states along the maximum likelihood paths for each trellis state are stored. When the path with the shortest distance is later selected, determining the trace back state only requires a memory access.
U.K. Publication No. 2371953 entitled “Viterbi equalizer which compares received data with predicted data based on the channel response estimate to generate path metrics” published Aug. 7, 2002 to Sherratt discloses an equalizer for use in processing received serial data signals sent by a transmitter and which may have been distorted during their transmission. The equalizer includes a trellis generator which receives both the serial data signals and the output of a channel estimator so as to generate the most probable bit sequence sent by the transmitter. The trellis generator operating by allocating to each branch of the trellis entering a particular state an individual branch metric which is based on the space distance between the received signal and the predicted signal received from the predictor for that state so that each branch metric is different from any other branch metric, and operates by calculating the two survivors of each Viterbi butterfly in the trellis at the same time.
Japanese Patent Publication No. 4369124 entitled “Soft Discrimination Viterbi Decode Method” published Dec. 21, 1992 discloses techniques to reduce a bit error rate of an original signal by calculating a margin for taking a bit string and applying soft discrimination Viterbi decoding thereto in the process of phase detection of a received carrier to obtain the bit string in the case of transmission of a convolution code. A soft discrimination Viterbi decoder is provided with a soft discrimination data calculation section to which a memory is built to calculate a soft discrimination data from a phase detected by a demodulation section. A de-interleave memory is connected to an output of the calculation section, and stores the soft discrimination data calculated by the calculation section. A Viterbi algorithm execution section is connected to the memory and a path memory storing an object path in the process of obtaining an optimum path. The data stored in the memory is read by the execution section while the bit sequence rearranged at the transmission is restored. Thus, the execution section uses the read soft discrimination data to obtain an optimum path on a trellis diagram thereby outputting a reproduction signal.
U.S. Pat. No. 5,796,756 to Choi, et al. issued Aug. 18, 1998 and entitled “Survivor memory device in Viterbi decoder using trace deletion method” discloses a memory device in a Viterbi decoder which determines a final survivor path using a trellis diagram and decision vectors, and outputs decoded data corresponding to the determined survivor path. The survivor memory device includes a path existence information generator for receiving a plurality of decision vectors, and for generating first branch path existence information representing whether a branch path exists between each state and the corresponding next states in the trellis diagram. A plurality of units are serially connected with respect to the outputs of the path existence information generator. Each unit comprises a path existence information store for receiving and storing the first branch path existence information, a path removal signal generator for generating corresponding path removal signals when the first branch path existence information corresponding to each current state represents that corresponding branch paths do not exist between each current state and the corresponding next states, and a path existence information updator for receiving the first branch path existence information stored in the path existence information store and the path removal signals generated by the path removal signal generator, and for updating values of second branch path existence information corresponding to each current state to represent that corresponding branch paths do not exist between each current state and the corresponding previous states.
Japanese Patent No. 10075185 entitled “Viterbi Decode Device” and published Mar. 17, 1998 discloses techniques for the Viterbi decoding of multilevel modulated data to which a redundant bit is applied by a convolution code by using a simple Viterbi decoder for binary modulation. Multilevel demodulated data obtained by receiving and demodulating a multilevel modulated signal are inputted and transmitted through circuits for converting the multilevel demodulated data into plural binary soft judgment data, so that data converted into binary data can be decoded by using a QPSK Viterbi decoder which is capable of soft judgment for binary modulation. Thus, the soft judgment of a multilevel modulated signal can be easily attained in digital ground broadcasting or the like, and at the same time, the sharing of a circuit with digital satellite broadcasting can be attained.
U.S. Pat. No. 6,448,910 to Lu issued Sep. 10, 2002 and entitled “Method and apparatus for convolution encoding and Viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements” discloses a method and apparatus for convolution encoding and Viterbi decoding utilizing a flexible, digital signal processing architecture that comprises a core processor and a plurality of re-configurable processing elements arranged in a two-dimensional array. The core processor is operable to configure the re-configurable processing elements to perform data encoding and data decoding functions. A received data input is encoded by configuring one of the re-configurable processing elements to emulate a convolution encoding algorithm and applying the received data input to the convolution encoding algorithm. A received encoded data input is decoded by configuring the plurality of re-configurable processing elements to emulate a Viterbi decoding algorithm wherein the plurality of re-configurable processing elements is configured to accommodate every data state of the convolution encoding algorithm. The core processor initializes the re-configurable processing elements by assigning register values to registers that define parameters such as constraint length and code rate for the convolution encoding algorithm. See also United States Patent Application Publication No. 20020135502 published Sep. 26, 2002.
U.S. Pat. No. 6,424,685 to Messel, et al. issued Jul. 23, 2002 entitled “Polar computation of branch metrics for TCM” discloses a method and apparatus for decoding TCM signals including simplified polar computations and Viterbi decoding. The method includes converting the received signal from Cartesian to polar coordinates in order to provide a reduction in the number and complexity of the associated calculations. The branch metric computation for the Viterbi decoding algorithm is performed using polar samples of the demodulated signal.
U.S. Pat. No. 5,946,361 to Araki, et al. issued Aug. 31, 1999 and entitled “Viterbi decoding method and circuit with accelerated back-tracing and efficient path metric calculation” discloses a Viterbi decoding circuit which stores comparison result bits in a bit-accessible path memory unit. A back-trace is performed by setting a state value in a shift register, then shifting comparison result bits from the path memory unit into the shift register. A certain number of bits at the shift-in end of this register are supplied as read address bits to the path memory unit. The Viterbi decoding circuit has selectors that first select old path metric values and branch metric values, which are added or subtracted to produce candidate path metric values, then select the candidate path metric values, which are subtracted to produce a comparison result bit representing the sign of their difference. These additions and subtractions are performed by the same arithmetic unit.
U.S. Pat. No. 5,802,116 to Baker, et al. issued Sep. 1, 1998 and entitled “Soft decision Viterbi decoding with large constraint lengths” discloses a method and apparatus for obtaining a soft symbol decoded output of a received signal by a two pass Viterbi operation. The technique is applied where the signal is convolutionally encoded with large constraint lengths. During the first pass, the error-correction co-processor (ECCP) is programmed for hard decoded output alone. After all the received symbol sets are hard-bit decoded, a second pass Viterbi operation is performed. Using the previously decoded hard bit to identify the most likely next state at an initial time instant, and initializing the present states at that initial time instant with pre-saved accumulated costs from the first pass Viterbi operation, branch metrics are computed for those state transitions leading to the most likely next state at that time instant. The accumulated cost values of the present states leading to the most likely next state are updated, and the absolute value of their difference is coded as a reliability of the hard decoded output corresponding to that time instant. The combination of the hard decoded output and the reliability obtained from the second pass Viterbi operation results in a soft symbol decoded output. At this point, the symbol set received at this time instant during the first pass Viterbi operation is reloaded into the ECCP which updates the accumulated cost values of all possible next states. These steps are repeated until all desired soft symbols are obtained.
U.S. Pat. No. 5,742,621 to Amon, et al. issued Apr. 21, 1998 and entitled “Method for implementing an add-compare-select butterfly operation in a data processing system and instruction therefor” discloses a parallel data structure and a dedicated Viterbi shift left instruction to minimize the number of clock cycles required for decoding a convolutionally encoded signal in a data processing system in software. Specifically, the data structure and Viterbi shift left instruction ostensibly reduce the number of clock cycles required for performing an add-compare-select butterfly operation. The add-compare-select butterfly operation is included in a DO loop in a plurality of instructions for executing a Viterbi decoding algorithm, and is repeated a predetermined number of times, for choosing the best path through a trellis diagram.
U.S. Pat. No. 5,440,504 to Ishikawa, et al. issued Aug. 8, 1995 and entitled “Arithmetic apparatus for digital signal processor” discloses a digital signal processor arithmetic apparatus capable of performing Viterbi decoding processing at a high speed with minimum addition of hardware and overhead of memory. Pathmetric value and branchmetric value read out from first and second memories on two paths are simultaneously added by an adder at most significant bits and least significant bits thereof. A comparator compares values of the most significant bits and the least significant bits output from the adder to generate a path select signal indicating the value which is path-metrically smaller. The select signal is stored in a shift register on a bit-by-bit basis. Of the values of the most significant bits and the least significant bits of a register storing the output of the adder, the smaller one as decided by the path select signal is written in the memory at eight most significant bits or least significant bits thereof via distributor, a bus and a register.
U.S. Pat. No. 5,432,804 to Diamondstein, et al. issued Jul. 11, 1995 and entitled “Digital processor and Viterbi decoder having shared memory” discloses an integrated circuit with a digital signal processor (DSP) and an error correction co-processor (ECCP) that implements a Viterbi decoding function. The DSP and ECCP share a block of multi-port memory, typically by bus multiplexing a dual-port RAM. When the ECCP possesses the RAM, it inhibits the DSP from accessing that block of the RAM by asserting an EBUSY flag. This technique conserves and optimizes the RAM usage, allowing the DSP and ECCP to be formed on the same integrated circuit chip.
U.S. Pat. No. 5,633,897 to Fettweis, et al. issued May 27, 1997 and entitled “Digital signal processor optimized for decoding a signal encoded in accordance with a Viterbi algorithm” discloses a DSP having two internal data buses with two MAC units each receiving data from its respective data bus. A shifter is interposed between the multiply unit and the ALU and accumulate unit. The improved DSP also has a multiplexer interposed between one of the MAC units and the two data buses. The improved DSP is optimized to decode a received digital signal encoded in accordance with the Viterbi algorithm, wherein the DSP calculates a first pair of binary signals C2n and C2n+1, a Viterbi butterfly based upon a second pair of binary Cn and Cn+m/2, and a transitional signal a, in accordance with: C2n=minimum (Cn+a, Cn+m/2−a); C2n+1=minimum (Cn−a, Cn+m/2+a).
U.S. Pat. No. 5,068,859 to Dolinar, et al issued Nov. 26, 1991 and entitled “Large constraint length high speed Viterbi decoder based on a modular hierarchial decomposition of the deBruijn graph” discloses a method of formulating and packaging decision-making elements into a long constraint length Viterbi decoder which involves formulating the decision-making processors as individual Viterbi butterfly processors that are interconnected in a deBruijn graph configuration. A fully distributed architecture, which achieves high decoding speeds, is made feasible by wiring and partitioning of the state diagram. This partitioning defines universal modules, which can be used to build any size decoder, such that a large number of wires is contained inside each module, and a small number of wires is needed to connect modules. The total system is modular and hierarchical, and it implements a large proportion of the required wiring internally within modules and may include some external wiring to fully complete the deBruijn graph.
U.S. Pat. No. 5,151,904 to Reiner, et al. issued Sep. 29, 1992 and entitled “Reconfigurable, multi-user Viterbi decoder” discloses a decoding system for decoding a digital data stream that has been convolutionally encoded in accordance with a selected constraint length and selected polynomial codes. The system includes a processor, such as a Viterbi decoder, that is reconfigurable so that it can decode encoded digital data streams for a number of different user channels for which data streams have been convolutionally encoded in accordance with respectively different combinations of selected constraint length and selected polynomial codes. The decoding system includes a Viterbi decoder for processing the encoded data stream in accordance with said selected constraint length and in accordance with said selected polynomial codes to decode the encoded data stream; a RAM for storing data of said selected constraint length and data of said selected polynomial codes in accordance with which said data stream was encoded; and a RAM I/O interface circuit responsive to a user channel identification signal for retrieving said selected constraint length data and said selected polynomial code data from the RAM and configuring the Viterbi decoder in accordance with said selected constraint length and said selected polynomial codes. In order to accommodate concurrent multiple user channels, the RAM stores different sets of combinations of constraint length data and polynomial code data corresponding to different user channels, with said different sets being retrievable from the RAM in response to respectively different user channel identification signals. The polynomial code data and constraint length data in the RAM may be changed from time to time in response to software instructions, as user channel requirements change. The Viterbi decoder processes said encoded data stream over a plurality of decoding cycles and produces intermediate decoding results during different decoding cycles; and the RAM I/O interface circuit stores in the RAM said intermediate decoding results produced for each different user channel during the different decoding cycles.
Despite the foregoing variety of solutions, none are able to perform at least one complete butterfly (two ACS) operations in a single cycle. Furthermore, none of the foregoing solutions permit the designer of the processor to readily add such a high-performance Viterbi decode extension instruction to the ISA during the design phase, the resulting design being optimized according to one or more criteria such as power conservation, clock speed, and die size due to reduced memory overhead and limited hardware requirements to support the extension.