The present invention relates to a method of compressing a dataset and, more particularly, to an approximate method of compressing and decompressing fixed length executable code and to a computer that executes such code.
The Huffman code (D. A. Huffman, xe2x80x9cA method for the construction of minimum redundancy codesxe2x80x9d, Proc. IRE vol. 40 no. 9 pp. 1098-1101, September 1952), also called variable-length code or prefix code because of two of its main properties, has been the subject of active research since its invention almost fifty years ago.
The Huffman algorithm sorts the symbols in probability order. Then the two symbols with the lowest probability are joined by a parent node. The probability of the parent node is the sum of the probabilities of the children of the parent node. This procedure continues recursively until a tree is built for all symbols. Each left branch of the tree is assigned a zero bit, and each right branch of the tree is assigned a one bit. The code of a symbol is the sequence of bits obtained by descending from the root of the tree to that symbol.
The average code length Lav is defined as                               L          av                =                              ∑                          i              =              1                        n                    ⁢                      xe2x80x83                    ⁢                                    p              i                        ⁢                          l              i                                                          (        1        )            
where n is the number of distinct symbols, pi is the probability of symbol i, and li is the code length (in bits) assigned to symbol i. Consider, for example, the following string:
xe2x80x9cABCDEFGABCDEFGABCDEABCDEABCDABCABCABABABA BABABABABABABABABABABABABABABABABABABABAA AAAAAAAAAAAAAAAAAAxe2x80x9d
This string has seven distinct symbols: xe2x80x9cAxe2x80x9d, xe2x80x9cBxe2x80x9d, xe2x80x9cCxe2x80x9d, xe2x80x9cDxe2x80x9d, xe2x80x9cExe2x80x9d, xe2x80x9cFxe2x80x9d, and xe2x80x9cGxe2x80x9d. There are 50 xe2x80x9cAxe2x80x9ds, 30 xe2x80x9cBxe2x80x9ds, 7 xe2x80x9cCxe2x80x9ds, 5 xe2x80x9cDxe2x80x9ds, 4 xe2x80x9cExe2x80x9ds, 2 xe2x80x9cFxe2x80x9ds and 2 xe2x80x9cGxe2x80x9ds in the string. The respective probabilities of the seven distinct symbols therefore are 0.5, 0.3, 0.07, 0.05, 0.04, 0.02 and 0.02. With seven distinct symbols, it would take three bits per symbol to encode this string if all symbols were of equal length, i.e., uncompressed. FIG. 1 shows the Huffman tree for this string and the code assigned to each symbol. The code has the prefix property: no code is the prefix of another code. From equation (1), the average code length is 1.94 bits.
A theoretical aspect of the Huffman code that has been investigated extensively is the redundancy of prefix coding. Another topic that has received considerable attention is the efficient implementation of prefix coding and decoding. Compression based on prefix coding is implemented in software in several popular utilities. The DEFLATE specification (P. Deutsch, xe2x80x9cDEFLATE compressed data format specification version 1.3xe2x80x9d, Request for Comments No 1951, Network Working Group, May 1996), for example, that is used by programs such as gzip, defines a format for data compression using a combination of the LZ77 algorithm (J. Ziv and A. Lempel, xe2x80x9cA universal algorithm for sequential data compressionxe2x80x9d, IEEE Transactions on Information Theory vol. 23 no. 3 pp. 337-343, May 1977) and Huffman coding. The DEFLATE specification uses canonical coding (E. S. Schwartz and B. Kallick, xe2x80x9cGenerating a canonical prefix codingxe2x80x9d, Communications of the ACM vol. 7 no. 3 pp. 166-169, March 1964), which helps in two ways. First, the actual codebook used for compression need not be transmitted to the decompressor: the codebook is completely defined by the sequence of bit lengths of the codes for each symbol in alphabet order. Second, canonical coding improves decompression performance by using a set of decoding tables instead of a Huffman tree.
Hardware implementations of Huffman coding (S. Chang and D. G. Messerschmitt, xe2x80x9cDesigning high-throughput VLC decoders Part Ixe2x80x94concurrent VSLI architecturesxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology vol. 2 no. 2 pp. 187-196, June 1992; S. M. Lei and M. T. Sun, xe2x80x9cAn entropy coding system for digital HDTV applicationsxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology vol. 1 no. 1 pp. 147-155, March 1991) are used in real-time applications such as high-definition television (HDTV). J. H. Jeon et al., in xe2x80x9cA fast variable-length decoder using plane separationxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology vol. 10 no. 5 pp. 806-812, August 2000), proposed a variant of the Lei and Sun decoder that considerably shortens processing time. B. J. Shieh et al., in xe2x80x9cA high-throughput memory-based VLC decoder with codeword boundary predictionxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology vol. 10 no. 8 pp. 1514-1521, December 2000), described the design of a prefix decoder with codeword boundary prediction. The decompressor predicts the codeword length before the codeword has been fully decoded. The predicted codeword length is used to enhance parallel decoding.
Approximations of Huffman coding also are known. These approximations run faster than true Huffman coding, at the expense of somewhat less efficient compression. The key idea behind the concept of approximate coding is to partition symbols into groups such that all the symbols in the same group are assigned codes with the same length. These groups have been termed sets, packages or classes by different investigators. An approximate Huffman-style coding method (T. M. Kemp et al., xe2x80x9cA decompression core for PowerPC, IBM Journal of Research and Development vol. 42 no. 6 pp. 807-812, November 1998) has been implemented in IBM""s PowerPC 405. A high performance PLA-based decompressor architecture for class-based code has been proposed by S. Weiss and S. Beren in xe2x80x9cHW/SW partitioning of an embedded instruction memory decompressor,xe2x80x9d Proc. Int""l Symposium on Hardware/Software Codesign, pp. 36-41, Copenhagen, Denmark, April 2001.
Early computer architectures (e.g., IBM 360/370, VAX, Intel x86) were designed with variable-length instructions to minimize program space, because of the expense of program memory. By contrast, many RISC architectures designed during the 1980""s (e.g., Alpha, PowerPC, Sparc) have fixed-length 32-bit instructions. At the expense of reduced object code density, the use of fixed-length instructions simplifies instruction-level parallel processing and streamlines pipelined hardware design. In embedded system-on-a-chip devices, however, in which the program memory takes up a substantial portion of the chip resources and cost, the tradeoff between object code density and execution efficiency is closer to the pre-RISC situation, and it is advantageous to save resources by compressing the instructions.
The primary requirements for a compression/decompression method and the associated embedded instruction memory decompressor for a system-on-a-chip device are:
1. Efficient compression
2. Coding that facilitates high-performance decompression hardware
3. A small codebook
Efficient compression depends on the choice of the alphabet. Splitting 32-bit instructions into instruction halves rather than bytes produces a large alphabet of 216 symbols, but creates an opportunity for more efficient compression. With a large alphabet, the second and third requirements listed above can be achieved by using a form of approximate prefix coding that simplifies decompression and reduces the codebook size.
The approximations of Huffman coding that have been implemented heretofore use ad hoc methods for class partitioning. There is thus a widely recognized need for, and it would be highly advantageous to have, an approximate Huffman coding method that is based on a systematic way of constructing classes.
According to the present invention there is provided a method of compressing a dataset that includes a number N of distinct symbols, all of the symbols having a common length, including the steps of: (a) ranking the symbols by frequency, thereby assigning to each symbol a respective rank i; (b) selecting a number Q of classes; (c) selecting Q distinct class codes cj indexed by an index j such that 1xe2x89xa6jxe2x89xa6Q; and (d) for each symbol: if the rank i of the each symbol is such that 2qxe2x88x921xe2x89xa6ixe2x89xa62qxe2x88x921 for an integer qxe2x89xa6Q: (i) assigning the class code cq to the each symbol, (ii) assigning a respective symbol code to the each symbol, and (iii) replacing at least one occurrence of the each symbol in the dataset with a concatenation of cq and the symbol code of the each symbol, thereby providing a compressed dataset.
According to the present invention there is provided a method of operating a processor, including the steps of: (a) providing a program that includes a plurality of distinct instructions, all of the instructions having a common length; (b) ranking the instructions by frequency, thereby assigning to each instruction a respective rank i; (c) selecting a number Q of classes; (d) selecting Q distinct class codes cj indexed by an index j such that 1xe2x89xa6jxe2x89xa6Q; (e) for each instruction: if the rank i of the each instruction is such that 2qxe2x88x921xe2x89xa6ixe2x89xa62qxe2x88x921 for an integer qxe2x89xa6Q: (i) assigning the class code cq to the each instruction, (ii) assigning a respective instruction code to the each instruction, and (iii) replacing at least one occurrence of the each instruction in the program with a concatenation of cq and the instruction code of the each instruction, thereby providing a compressed program; (f) storing the compressed program in a program memory; (g) for at least one concatenation: (i) retrieving the concatenation from the program memory, (ii) decompressing the concatenation, thereby providing a decompressed instruction, and (iii) executing the decompressed instruction, using the processor.
According to the present invention there is provided a computer, including: (a) a processor; (b) at least one program memory for storing a plurality of compressed instructions, each compressed instruction including an instruction code; (c) for each at least one program memory: (i) a code memory for storing a plurality of distinct instances of instructions, the instruction codes serving as bases for computing respective indices to the compressed instructions in the code memory, and (ii) a decompression mechanism for (A) extracting the instruction codes from the compressed instructions, (B) retrieving the instances from the code memory in accordance with the instruction codes, and (C) providing the instances to the processor for execution.
According to the present invention there is provided a computer readable storage medium having computer readable code embodied on the computer readable storage medium, the computer readable code for compressing a dataset that includes a plurality of distinct symbols having a common length, the computer readable code including: (a) program code for ranking the symbols by frequency, thereby assigning to each symbol a respective rank i; (b) program code for selecting a number Q of classes; (c) program code for selecting Q distinct class codes cj indexed by an index j gsuch that 1xe2x89xa6jxe2x89xa6Q; and (d) program code for: for each symbol: if the rank i of the each symbol is such that 2qxe2x88x921xe2x89xa6ixe2x89xa62qxe2x88x921 for an integer qxe2x89xa6Q: (i) assigning the class code cq to the each symbol, (ii) assigning a respective symbol code to the each symbol, and (iii) replacing at least one occurrence of the each symbol in the dataset with a concatenation of cq and the symbol code of the each symbol, thereby providing a compressed dataset.
The method of the present invention is directed at compressing a dataset that includes N distinct symbols, all of equal length. First, the symbols are ranked by frequency, and each symbol is assigned a respective rank i, with i=1 being the rank of the most common symbol and i=N being the rank of the least common symbol. A number Q of classes to use is selected. The symbols ranked 1 through 2Qxe2x88x921 are encoded using a two part code that is a concatenation of a class code and a symbol code. There are Q class codes {cj}, with the symbols ranked from i=2qxe2x88x921 through i=2qxe2x88x921 being assigned to the q-th class and so receiving the class code cq. Within each class, the symbol codes of the various symbols are unique. In one embodiment of the present invention, the class code of the j-th class is exactly j bits long. Preferably, however, the class codes are obtained by applying Huffman coding to the classes. Most preferably, the symbol codes of the j-th class, i.e., the symbol codes of the symbols that belong to the j-th class, are at most jxe2x88x921 bits long. In the case of the first class, which contains only one symbol, this means that the symbol code of this symbol is a null code, i.e., that this symbol is encoded as only c1.
If Nxe2x89xa72Q, then each of the (unclassified) symbols ranked higher than 2Qxe2x88x921 is encoded as the concatenation of a literal class code and the symbol itself. Such symbols are termed xe2x80x9cliteralsxe2x80x9d herein.
A symbol codebook is constructed by storing one instance of each of the distinct uncompressed classified symbols, in rank order, in the first 2Qxe2x88x921 locations of a memory. The symbol codes of the classified symbols then serve as bases for computing respective indices of the instances within the memory. The compressed dataset, that is obtained by encoding the dataset as described above, is decompressed with reference to this symbol codebook.
A particular application of the method of the present invention is to the operation of a processor, in particular the processor of an embedded system-on-a-chip device, that is driven by instructions that are all of equal length. The dataset in this case is the program that the processor runs. The symbols are the instructions, so that in this application of the method of the present invention, the symbol codes are instruction codes, the symbol codebook is an instruction codebook, and the compressed instructions are concatenations of the class codes and the instruction codes. The program is stored in its compressed form. As the instructions are needed, they are decompressed individually with reference to the instruction codebook, and executed by the processor.
In the preferred case of 32-bit instructions split into instruction halves, the first halves and the second halves are treated as two different datasets and are compressed separately, with separate instruction codebooks.
A computer of the present invention includes a processor and at least one program memory for storing instructions that have been compressed using the method of the present invention. Associated with each program memory is a corresponding code memory, for storing distinct instances of the uncompressed instructions, that serves as an instruction codebook, and a corresponding decompression mechanism for decompressing the compressed instructions and providing the decompressed instructions to the processor for execution. In the preferred case of 32-bit instructions split into instruction halves, there are two such program memories, one for the first halves of the instructions and the other for the second halves of the instructions; and the computer also includes a concatenation mechanism for concatenating the decompressed first halves and the decompressed second halves to form complete decompressed instructions that then are sent to the processor to be executed.
Preferably, the decompression mechanism includes Q class code mask registers for comparing with the class codes of the compressed instructions. With each class code mask register is associated a class code interpretation register that includes a class base, a shift control and an instruction mask.
The scope of the present invention also includes a computer readable storage medium in which is embodied computer readable code for implementing the method of the present invention.