This invention relates to an asynchronous decoder circuit. More specifically, this invention relates to an asynchronous decoder circuit that operates on data coded using a variable-length coding technique, such as Huffman coding.
Huffman coding is a lossless coding technique that replaces fixed-length symbols with variable-length codes. Huffman codes are entropy-based, meaning that short codes are assigned to frequently occurring symbols and longer codes are assigned to less frequently occurring symbols. In addition, Huffman codes are prefix codes, meaning that each code has the property that appending additional bits to the end of the code never produces another valid code. Advantageously, Huffman coding has been used for the compression of data.
In applications utilizing Huffman-coded data, the size and speed of a Huffman decoder are important. For example, small and fast Huffman decoders are necessary in compressed-code systems and MPEG-2 video systems. A compressed-code system is a microprocessor or microcontroller-based system in which instructions are stored in compressed form in memory, then are decompressed when brought into a cache. As a result, a significant reduction in instruction memory size may be obtained. The design of a decompression circuit for these systems is highly constrained. In particular, the circuit must be very fast (since it is on the critical path between the processor and memory) and must also be very small (otherwise the savings in instruction memory will be lost to the area increase due to the decompression circuit). MPEG-2 is an international image coding standard promulgated by the International Standardization Organization (ISO), which requires data to be decoded at a rate of 100 Mbits/sec or greater to maintain a sufficient quality of audio and video output.
To date, there have been two commonly used approaches to the design of Huffman decoders, which have been commonly referred to as the constant-input-rate approach and the constant-output-rate approach. Both of these approaches are a synchronousxe2x80x94i.e., the decoders are synchronized to an external system clock.
In the constant-input-rate approach, the input data stream is processed at a rate of one bit per clock cycle by traversing a Huffman code tree through the use of a finite state machine. To achieve a high performance using this type of design requires a very fast clock, introducing many very difficult high-speed circuit problems. In fact, it is unlikely that a state machine of adequate complexity can be designed to run at the speeds required by the constant-input-rate approach on a silicon wafer produced by certain semiconductor processes, such as those using 0.8xcexc or thicker CMOS wafers. To avoid the problems caused by the use of very high-speed clocks, multiple state transitions may be combined into a single cycle. As multiple state transitions are combined, however, the complexity and circuit area of the decoder increase approximately exponentially with respect to the increased performance per clock cycle.
In the constant-output-rate approach, a portion of the input data stream, at least as long as the longest input symbol, is translated into an output symbol on each clock cycle. One disadvantage to this approach is that it requires more complex shifting and symbol detection circuitry than the constant-input-rate approach. Furthermore, the input data buffer and shifting circuitry must be wide enough to store and shift the longest of the input symbols, which is inefficient since the most frequently occurring input symbols will be shorter than the longest input symbol. Another significant disadvantage of the constant-output-rate approach is that the length of the critical path is dominated by the time to detect and decode the longest input symbol. Thus, the vast majority of cycles are limited by a very infrequent worst-case path.
In sum, each of the two commonly-used approaches to the design of Huffman decoders requires a compromise between the performance and the complexity (circuit area) of the implementations. Accordingly, there exists a need for an improved Huffman decoder design that provides higher performance per circuit area than is possible with existing circuit designs.
The present invention solves the foregoing problems by employing an innovative asynchronous design, which produces a decoder that is significantly smaller than comparable synchronous decoders, and yet has a higher throughput rate than these decoders after normalizing for voltage and process differences between the decoders.
According to the present invention, there is provided a decoder circuit, which includes a logic circuit for decoding variable-length coded data and a timing circuit. The logic circuit includes a plurality of computational logic stages, each of the computational logic stages having a synchronization signal input and a completion signal output. Each completion signal output indicates the completion of the computation performed by a computational logic stage. The timing circuit includes a plurality of completion signal inputs, which are coupled to the completion signal outputs of the computational logic stages, and a synchronization signal output, which is coupled to the synchronization signal inputs of the computational logic stages. The synchronization signal output of the timing circuit is not a periodic signal with a fixed cycle period. Instead, the synchronization signal is an asynchronous output determined as a function of the completion signal inputs.
In a preferred embodiment of the present invention, the decoder operates on data that has been coded according to a variable-length coding technique in which coded data words are classified according to their word length and the occurrence of common bits therein. The common bits are unique relative to at least a subset of the classes of the coded data words. In such an embodiment, the logic circuit of the decoder includes: an alignment circuit for shifting an input data word by an amount responsive to a control input and for outputting the shifted data word; a match logic circuit coupled to the output of the alignment circuit for decoding the class of a coded data word included in the shifted data word; a decode logic circuit coupled to the output of the alignment circuit for decoding the coded data word included in the shifted data word; a length logic circuit coupled to the output of the match logic circuit for determining the length of the coded data word included in the shifted data word; an offset register having a register data input and a register data output, the register data output coupled to the control input of the alignment circuit; and an adder circuit for adding first and second adder inputs, the first adder input coupled to the output of the length logic circuit and the second adder input coupled to the register data output, the output of the adder circuit coupled to the register data input.
The decoder circuit is preferably designed such that the alignment circuit, the match logic circuit, and the adder circuit comprise a computational logic stage, and the alignment circuit, the match logic circuit, and the decode logic circuit comprise another computational logic stage.
The adder circuit may include a carry output indicative of a carry resulting from the addition of the first and second adder inputs, and the logic circuit may further include an input buffer having a plurality of registers. The registers may be coupled together in series, and the data output of one or more of the registers may be coupled to the data input of the alignment circuit.
The logic circuit may further include a shift sequence circuit coupled to the carry output of the adder circuit and to the clock inputs of the input registers for shifting the input registers responsive to the carry output of the adder circuit.
The decoder circuit may further include input and output handshaking circuits for implementing an asynchronous handshake between the decoder circuit and external circuits coupled to the decoder circuit.