The present invention relates to a method and apparatus for compressing and decompression information organized in short blocks, such as computer program instructions
With computer processors getting smaller and cheaper, and computer programs getting larger and more complex, the size and cost of a computer""s memory for storing program information has become a significant portion of the cost of a computer solution. While memory cost is important in general purpose computer systems, such as personal computers, it becomes critical in embedded special-purpose computer devices, especially those used in low-cost products. Significant cost reductions in computer-based products may be realized by reducing the memory required by a particular program. One possible technique involves compressing the program instructions in memory.
Prior art data coding and compression schemes have most widely been used to compress data and code for storage on DASD or tape backup systems. Typically, such methods are directed toward achieving a high degree of compression on large blocks of data. Lossless data compression is used extensively in connection with storage of data on disk file systems, backup and archiving systems, and storage of data on tape. Such systems are typically implemented in software. Well known examples include disk compression products, such as the UNIX Compress program, or the DOS and OS/2 Stacker and pkZIP programs. Typically, these programs employ adaptive data compression techniques, such as LZ1 or LZ2.
The requirements for effective compression of data in the high speed memory of a computer system are very different than the requirements for compression of large blocks of data. Compression/decompression techniques for this application must be able to effectively handle short blocks of data. The data will be compressed once and decompressed repeatedly. Thus, the decompression scheme must be quick and efficient, while the compression scheme can be relatively slow and complex. What is needed is a compression/decompression technique that effectively handles short data blocks, while providing quick and efficient decompression.
The present invention is a method and apparatus for compression and decompression of information, such as computer program instructions, which provides quick and efficient decompression.
In order to compress information, the present invention encodes information comprising a plurality of units by receiving the information to be encoded, splitting the information into a plurality of subsets, each subset comprising a plurality of symbols, each symbol comprising at least a portion of a unit of information, by assigning a codeword to each symbol, for each subset. Preferably, the assignment is performed by determining the frequency of occurrence of each symbol, for each subset, and assigning a codeword to each symbol, based on the frequency of occurrence of each symbol, for each subset.
In one embodiment of the present invention, a codeword-symbol assignment table is generated, for each subset. Preferably each codeword includes an index indicating a location in the codeword-symbol assignment table and a prefix indicating a length of the index.
In another embodiment of the present invention, a plurality of symbol groups are generated for each codeword-symbol assignment table. Preferably each codeword includes prefix indicating one of the plurality of symbol groups and an index indicating a location in the indicated symbol group.
In order to decompress information encoded according to the present invention, the information comprising a plurality of codewords, each codeword is decoded to form a symbol, each symbol is grouped into one of a plurality of subsets and the plurality of subsets is merged to form decoded information.
Preferably, each codeword comprises an index indicating a location in a symbol group in a codeword-symbol assignment table and a prefix indicating a symbol group and a length of the index. An index may represent a literal symbol, and the prefix may further indicate whether the index represents a literal symbol.