The present invention relates to the decompression of compressed datasets and, more particularly, to a device and method for decompressing datasets that have been compressed as class-based codewords.
Embedded microprocessors have become widely used in many products ranging from cellular telephones to digital video cameras to vehicular engine controllers. A typical system-on-a-chip (SOC) consists of a microprocessor core, on-chip memory and various peripherals according to the intended application. The on-chip program memory, usually ROM or flash memory, often occupies a substantial portion of the chip's area, sometimes close to half of the chip's area. As embedded software complexity increases to provide more functionality, the limited memory capacity of a SOC often is a restricting factor. Object code compression in a SOC offers the following tradeoff: investment in hardware (decompressor unit) helps to reduce the size of the software (application programs, real-time operating system) without reducing the functionality of the software.
In desktop systems or servers, text or binary files often are compressed to save both disk space and transfer time over a network from one system or server to another. Some popular file compression utilities use variants of the Lempel-Ziv window-based (J. Ziv and A. Lempel, “A universal algorithm for sequential data compression”, IEEE Transactions on Information Theory vol. 23 no. 3 pp. 337–343 (May 1977)) or dictionary-based (T. A. Welch, “A technique for high-performance data compression”, IEEE Computer vol. 17 no. 6 pp. 8–19 (June 1984)) algorithms. These methods are not suitable for use in embedded systems because these methods decode a compressed file from the beginning to the end, and do not support random reading and decoding of portions of the compressed file. Embedded systems must provide random access to compressed blocks of object code. Decompressing the entire program memory is not feasible because the size of the decompressed code exceeds the on-chip memory capacity.
The requirement of compressing short blocks that need to be accessed randomly limits the choice of compression methods. Huffman coding (D. A. Huffman, “A method for the construction of minimum redundancy codes”, Proc. IRE vol. 40 no. 9 pp. 1098–1101 (September 1952)) has been used to compress programs in embedded systems (A. Miretsky et al., “RISC code compression model”, Proc. Embedded Systems Conference, Chicago Ill., March 1999). Another variable-length-code compression method, class-based coding, also has been used in embedded systems, specifically, in IBM's 405 PowerPC core (T. M. Kemp et al., “A decompression core for PowerPC”, IBM Journal of Research and Development vol. 42 no. 6 pp. 807–812 (November 1998)). In both of these examples, a compression utility produces blocks of compressed object code and a symbol table. The blocks of compressed object code and the symbol table are stored in the embedded system's memory. Blocks of compressed instructions are fetched and decoded to reconstruct the uncompressed program at run time. Huffman coding and class-based coding are defined below.
If an object file is considered as a sequence of 8-bit bytes, the alphabet consists of 28=256 symbols. Alternatively, the same object file can be seen as a sequence of 16-bit symbols, in which case the alphabet size is 216=65,536. Although the choice of 16-bit symbols would give better compression, especially if the object file consists of fixed-length 32-bit RISC instructions as in Kemp et al. (1998), maintaining a full Huffman tree with 216 leaf nodes is expensive in terms of both storage space and coding speed.
Canonical coding (E. S. Schwartz and B. Kallick, “Generating a canonical prefix coding”, Communications of the ACM vol. 7 no. 3 pp. 166–169 (March 1964)) eliminates the need for maintaining an explicit Huffman tree. (Although canonical coding creates a tree for code assignment, the tree is not used for coding and decoding.) Canonical coding creates an array of the alphabet symbols sorted in the order of their frequency of occurrence and a small table that specifies the “breakpoints” in the array of symbols where the code length changes. Coding is done by a straightforward computation using the sorted array of symbols and the table of breakpoints.
The use of canonical code simplifies coding and reduces space requirements; but if the alphabet is large relative to the size of the file to be coded, or if the file is broken up into blocks that are coded separately, then the amount of information that must be transferred for decoding still is a concern.
Another approach to address problems involving large alphabets is alphabet partitioning. Alphabet partitioning is a hierarchical decomposition strategy. The source alphabet is broken up into a number of “classes”, and coding is done in two phases. In the first phase, a “class code” is assigned to every class. In the second phase, a “symbol code” is assigned to every symbol in the class. This two-phase coding allows the use of different coding methods for classes and symbols. Classes are entropy-coded with the goal of providing good compression efficiency. Symbols are coded using a very simple method (for example, the symbol code is just an index), with the goal of reducing coding complexity.
A. Said and W. A. Perlman, in “Low-complexity waveform coding via alphabet and sample-set partitioning”, Visual Communications and Image Processing '97, Proc. SPIE Vol. 3024, pp. 25–37 (February 1997), present an analysis that shows that a good design requires partitioning with the following properties:                1. the symbols in a class occur very infrequently, or        2. the frequency distribution within a class is close to uniform.Such a design realizes the full power of alphabet partitioning, and coding complexity is reduced at the cost of only a small loss in compression efficiency.        
Huffman coding assigns variable-length codes to the symbols of an alphabet based on the frequency of occurrence of a symbol in the text or object file, with frequent symbols being assigned short codes. The following table is an example of Huffman code assignment for an eight-symbol alphabet:
SymbolFrequencyCodewordA0.50B0.15110C0.11100D0.09101E0.071110F0.0511110G0.02111110H0.01111111The average code length of this example is 2.26 bits.
Huffman codes have the “prefix property”: no codeword is the prefix of another codeword. Conceptually, the decoding process begins from the root of the Huffman tree, and a branch of the tree is selected according to the next bit in the code. This process continues until a leaf node is reached. This leaf node contains or points to the decoded symbol. The prefix property guarantees uniquely decipherable codes.
S. M. Lei and M. T. Sun, in “An entropy coding system for digital HDTV applications”, IEEE Transactions on Circuits and Systems for Video Technology vol. 1 no. 1 pp. 147–155 (March 1991), which is incorporated by reference for all purposes as if fully set forth herein, describe the design of a constant-output-rate decoder for compression systems in advanced television applications. This decoder, which is illustrated in FIG. 1 as decoder 10, decodes variable-length code at a constant output rate of one symbol per clock cycle. The core of decoder 10 is a programmable logic array (PLA) 22. Assuming an alphabet size of 2″ symbols and the use of a bounded Huffman code (D. C. Van Voorhis, “Constructing codes with bounded codeword lengths, IEEE Transactions on Information Theory vol. 20 no. 3 pp. 288–290 (March 1974)) such that the longest codeword is at most w bits long, then PLA 22 implements a truth table with 2″ product terms, w-bit wide input, and two outputs: the n-bit decoded symbol and the code word length encoded in log2w bits. An accumulator 20 adds up the codeword length for each decoded symbol and controls a barrel shifter 18. When accumulator 20 exceeds the maximum codeword length w, accumulator 20 produces a carry that transfers the contents of a first latch 14 to a second latch 16, and also loads w bits from an input buffer 12 to first latch 14.
S. Chang and D. G. Messerschmitt, in “Designing high-throughput VLC decoder Part I—concurrent VSLI architectures, IEEE Transactions on Circuits and Systems for Video Technology vol. 2 no. 2 pp. 187–196 (June 1992)), present a VSLI architecture and a parallel decoding method for variable-length-code decoders. While the primary application that they envision, and that Lei and Sun (1991) envision, is high-throughput video compression systems, their work is generally applicable to compression systems that use a prefix code.
Resuming the discussion of alphabet partitioning, one useful special case of alphabet partitioning is “class-based coding”. In a class-based code, a “class” is a group of symbols that are assigned codes with the same length. Every symbol in the alphabet belongs to a single respective class. Every class is identified by a unique “class code”. If a class consists of 2q symbols, a q-bit “symbol code” is appended to the class code to identify each symbol that belongs to that class. A “codeword” consists of a class code followed by a symbol code.
FIG. 2 and the following table illustrate class-based coding for the eight-symbol alphabet that is used above to illustrate Huffman coding. As shown in FIG. 2, this code includes three classes. In each class there is a sequence of zero or more bits b that are used to encode the symbols of that class.
SymbolFrequencyClassCodewordA = 0000.500B = 0010.1510b100C = 0100.1110b101D = 0110.0911bbb11011E = 1000.0711bbb11100F = 1010.0511bbb11101G = 1100.0211bbb11110H = 1110.0111bbb11111
The use of classes splits the decoding process into two phases. In the first phase, the code length is determined. In the second phase, the symbol code is decoded by accessing a lookup table. This simplifies decoding because class codes are short and the symbol code is just an index.
In this example, the last five symbols are “literals”, i.e., symbols whose contents are not changed by the coding process. A literal is coded by simply prepending the class code to the symbol. In other words, the symbol code of a literal is the literal itself. The class of literals contains symbols that have the lowest frequencies. Literals are useful in coding large alphabets, especially if only a relatively small number of symbols have significantly large frequencies. This relatively small number of symbols is stored in a lookup table, and the rest of the symbols are coded as literals. The symbol codes of the symbols that are not literals are referred to herein as “index codes” because these symbol codes are used as indices to the lookup table.
Examples of embedded microprocessors that use code compression include the IBM PowerPC 405 core of Kemp et al. (1998) and the Motorola MPC 555 of Miretsky et al. (1999). The Motorola chip implements Huffman code. IBM's CodePack is a class-based implementation that is discussed in more detail below. To locate variable-length blocks in compressed memory, the IBM design implements an address table similar to the one proposed by A. Wolfe and A. Chanin in “Executing compressed programs on an embedded RISC architecture, Proc. Int'l Symp. On Microarchitecture, pp. 81–91 (1992). This approach has the advantage that compression is transparent to the processor, which produces addresses to uncompressed memory. The Motorola design involves changes in the PowerPC core in order to directly address bit-aligned instructions in compressed memory.
Prior art decoder 10 is intended for decoding 8-bit symbols and a maximum codeword length of sixteen bits. The corresponding size of PLA 22 is reasonable: 16-bit input, 12-bit output (8-bit symbol and 4-bit codeword length) and 256 product terms. This design is not suitable for an alphabet size of 216 symbols because PLA 22 would require 65,536 product terms. There is thus a widely recognized need for, and it would be highly advantageous to have, a decoder capable of decoding 16-bit symbols, for use, for example, in an embedded processor with 32-bit RISC instructions.