The present invention relates in general to data compression schemes and, more particularly, to bit-packing and bit-unpacking using a barrel shifter in a data compressor and data decompressor.
Data compression schemes are well known in the art for encoding a stream of digital data signals into compressed digital data. Data compression generally refers to a process of inputting the data stream in a standard format, say 8-bit ASCII characters, and outputting the same information in a compressed format with fewer bits than the original format.
The compressing process is advantageous when considering data storage and data transmission. If the data is compressed into fewer total bits that represent the same information, then less space is required in the mass storage device. Likewise, data transmission occurs more rapidly when fewer bits are transmitted. In general by reducing the total number of ones and zeroes, data becomes more efficient to handle. When the time comes to use the data, it must be decompressed back into its original format for use by the end device.
One common compression technique is described in U.S. Pat. No. 5,003,307. The compression system includes a data compressor, a data decompressor and an interconnecting medium such as a transmission link or a mass storage device. Uncompressed data words are serially processed through the data compressor which builds a compressor vocabulary table comprising a history of incoming data and which sends a sequence of codewords across the transmission link, or to the mass storage device, to the data decompressor. The codewords are serially processed through the data decompressor to build a corresponding decompressor vocabulary table and provide uncompressed data words to the end device.
In the data compressor, each incoming data word is compared to the existing vocabulary table. If no match is found, the data compressor sends the data word as part of a codeword across the transmission link, or to the mass storage device, and further places the data word at the end of the vocabulary table. No actual data compression occurs if no match is found. The transmission capacity needed to send an uncompressed data word may be ten bits: eight bits for the uncompressed data word and two bits, say "00", to represent the "length" of the matched string of data words-in this case zero.
If on the other hand one or more matches are found in the vocabulary table, the data compressor notes the locations of the matches in the vocabulary table. No data is sent initially but the incoming data is still added to the end of the vocabulary table. The next incoming data word is checked for a match to the contents of the next locations in the vocabulary table following the first matches, effectively searching for length-two string matches in the vocabulary table. If the second incoming data word fails to match the contents of the next locations, the length of the longest matched string is determined to be one. The first match may be conveyed as a codeword that contains the uncompressed data word as in the case when no match is found. The transmission capacity needed to send a codeword that conveys a length-one matched string may be ten bits: eight bits for the uncompressed data word and two bits, say "01", to represent the length of the matched string of data words--in this case one. Alternately, the "location" of a length-one match in the vocabulary table may be sent. Since typical implementations use vocabulary tables containing at least 1024 locations, which require at least 10-bits to represent, it is often preferable to include the 8-bit length-one match data word as the codeword.
If the second incoming data word matches the contents of at least one of the next locations, the process continues until a subsequent data word fails to match any of the next locations in the vocabulary table. The data compressor notes the number of such matches in the vocabulary table. A codeword is sent identifying the location of the first match and the length of the matched string of data words. Thus, if successive incoming data words "A", "B", "C" happen to match the same previously stored data string, the resulting codeword would have the starting location to the match of "A" and a length of three.
The transmission capacity needed to send the codeword depends on the number of bits required to represent the length and location fields. As is well known in the art, the size of the location field is typically determined by either the current number of entries in the vocabulary table or by the maximum size of the vocabulary table. The size of the length field is typically chosen to vary according to a prefix code wherein more probable length values are uniquely encoded using fewer bits with respect to less probable length values. For example, the size of the codeword that represents the length-three string "ABC" may also be ten bits: seven bits to convey the location in the vocabulary table (that contains less than 128 locations) and three bits which encode the length of the match, say "101". The data compressor releases one 10-bit codeword representative of the entire character string for transmission and/or storage. One 10-bit codeword requires less space to store and less time to transmit as compared to three individual uncompressed data words (24-bits). Thus when string matches of length greater than one are found, the data compressor offers the feature of transmitting or storing fewer total bits to represent the same information as compared to uncompressed formats.
On the decompression side, the data decompressor receives the sequence of codewords from the data compressor by transmission link or from a mass storage device. The data decompressor begins to build its own vocabulary table from the incoming compressed data. Codewords beginning with "00" are taken as containing uncompressed data words which are provided directly to the end device and are added to the end of the decompressor vocabulary table. Other codewords containing location and length fields are converted to standard format by reading the designated string from the vocabulary table. These data words are further added to the end of the vocabulary table and sent to the end device.
The aforedescribed data compressor may include a content addressable memory (CAM) to hold its vocabulary table. Each CAM array memory cell is individually addressable with read/write capability. Each incoming data word is compared in parallel to the existing contents of the CAM array and is sequentially placed in the next available CAM array memory cell. Once the CAM array reaches capacity, the addressing wraps around to the beginning of the array, thereafter overwriting the contents of the oldest CAM array memory cell.
Many compression algorithms use variable-length codewords to represent compressed data. Alternatively, many compression algorithms use fixed-length codewords. When these lengths are not well suited for devices using byte-oriented busses, e.g. microprocessors, RAMs, it is necessary to pack the sequence of codewords to create a byte-oriented data stream. A variable-length representation of the location of the matched string is often used (a gear shift technique). In the example when a vocabulary table contains 1024 locations and is only partially full, fewer than ten bits are needed to address any location. If only four characters are written into the vocabulary table, only two bits are needed to represent the location. Likewise, if the vocabulary table contains sixteen characters, the codeword need be only four bits wide. For efficient data transmission, the codeword should be made a fixed width field. However, the transmitted data should not have unused bits.
Hence, a need exists to take variable length codewords and generate fixed width data words for transmission while using all available bits.