1. Field of the Invention
This invention relates generally to Viterbi decoders. More particularly, this invention relates to a branch metric unit duplication method to achieve high speed decoder field programmable gate array (FPGA) implementation.
2. Description of the Prior Art
A Viterbi decoder performs an optimum decoding of convolutionally encoded digital sequences. It is widely used in digital communication systems with data rates ranging from few kbps in narrowband applications to several hundreds of Mbps in broadband applications like Wireless LAN.
As shown in FIG. 1, a Viterbi decoder 100 is comprised of three units: a branch-metric computation unit (BMU) 102, an add-compare select unit (ACSU) 104 and a survivor path memory unit (SMU) 106. The input data is used in the BMU 102 to calculate the set of branch metrics for each new time step. These metrics are then fed to the ACSU 104 that accumulates the branch metrics recursively through the trace-back latch unit 108 as path metrics according to the trellis determined by a convolutional encoder polynomial. The SMU 106 processes the decisions being made in the ACSU 104 and outputs an estimated path, with a latency of trace-back depth.
It is clear that ACSU 104 and SMU 106 architectures depend only on the trellis and hence these two units are independent of the application for which a Viterbi decoder is being used. The application specific computations are done in the BMU 102 according to soft input definition; and the interpretation of the decoded path into data at the output of the SMU 106 is also dependent upon the output format definition. Since the application specific parts of a Viterbi decoder are mainly found at the input and output, the high speed architecture of ACSU 104 can be generally applicable.
If a high speed Viterbi decoder needs to be implemented for broadband applications with greater than 100 Mbps data rates, the critical path of a Viterbi decoder must be minimized. By looking at the block diagram of a Viterbi decoder 100 in FIG. 1, it is obvious that the BMU 102 as well as the SMU 106 are purely feedforward and the throughput can easily be increased by massive pipelining. However, this does not hold for the ACSU 104 because of the feedback loop through the trace-back latch unit 108.
One way to improve the throughput of ACSU 104 is to apply a look-ahead scheme (radix-4 architecture) to the trellis 200 as shown in FIG. 2. A radix-4 architecture achieves a double data rate without increasing the clock rate because a radix-4 architecture can run at the clock rates employed by a radix-2 architecture. The circuit complexity associated with a conventional radix-4 architecture is greater however, as can be seen with reference to FIG. 3 and FIG. 4, where a conventional radix-4 ACSU 400 basically requires 2-stage comparison circuits 401, 402 including 4 more adders and 2 more multiplexers than that required by a conventional radix-2 ACSU 300 shown in FIG. 3.
Further, interconnection between BMU 102 and ACSU 104 cause longer routing delays because the ACSU circuit 104 takes more area and hence interconnections between the ACS cell 104 and BMU 102 as shown in FIG. 5 become complicated. Regarding a FPGA implementation, the ACSU 104 is expected to be fitted into several slices or logic cells; and hence, the routing delay gets even more dominant and comprises about 50% of the critical path delay.
In view of the foregoing, it is both advantageous and desirable to provide a branch metric duplication method that substantially reduces interconnection delays in order to implement a high speed radix-4 Viterbi decoder targeted for FPGA applications.