The present invention relates in general to data compression schemes and, more particularly, to the compression and decompression of data following the resetting of vocabulary tables.
Data compression schemes are well known in the art for encoding a stream of digital data signals into compressed digital data. Data compression generally refers to a process of inputting the data stream in a standard format, say 8-bit ASCII characters, and outputting the same information in a compressed format with fewer bits than the original format.
The compressing process is advantageous when considering data storage and data transmission. If the data is compressed into fewer total bits that represent the same information, then less space is required in the mass storage device. Likewise, data transmission occurs more rapidly when fewer bits are transmitted. In general by reducing the total number of ones and zeroes, data becomes more efficient to handle. When the time comes to use the data, it must be decompressed back into its original format for use by the end device.
One common compression technique is described in U.S. Pat. No. 5,003,307. The compression system includes a data compressor, a data decompressor and an interconnecting medium such as a transmission link or a mass storage device. Uncompressed data words are serially processed through the data compressor which builds a compressor vocabulary table comprising a history of incoming data and which sends a sequence of codewords across the transmission link, or to the mass storage device, to the data decompressor. The codewords are serially processed through the data decompressor to build a corresponding decompressor vocabulary table and provide uncompressed data words to the end device.
In the data compressor, each incoming data word is compared to the existing vocabulary table. If no match is found, the data compressor sends the data word as part of a codeword across the transmission link, or to the mass storage device, and further places the data word at the end of the vocabulary table. No actual data compression occurs if no match is found. The transmission capacity needed to send an uncompressed data word may be ten bits: eight bits for the uncompressed data weird and two bits, say "00", to represent the "length" of the matched string of data words-in this case zero.
If on the other hand one or more matches are found in the vocabulary table, the data compressor notes the locations of the matches in the vocabulary table. No data is sent initially but the incoming data is still added to the end of the vocabulary table. The next incoming data word is checked for a match to the contents of the next locations in the vocabulary table following the first matches, effectively searching for length-two string matches in the vocabulary table. If the second incoming data word fails to match the contents of the next locations, the length of the longest matched string is determined to be one. The first match may be conveyed as a codeword that contains the uncompressed data word as in the case when no match is found. The transmission capacity needed to send a codeword that conveys a length-one matched string may be ten bits: eight bits for the uncompressed data word and two bits, say "01", to represent the length of the matched string of data words--in this case one. Alternately, the "location" of a length-one match in the vocabulary table may be sent. Since typical implementations use vocabulary tables containing at least 1024 locations, which require at least 10-bits to represent, it is often preferable to include the 8-bit length-one match data word as the codeword.
If the second incoming data word matches the contents of at least one of the next locations, the process continues until a subsequent data word fails to match any of the next locations in the vocabulary table. The data compressor notes the number of such matches in the vocabulary table. A codeword is sent identifying the location of the first match and the length of the matched string of data words. Thus, if successive incoming data words "A", "B", "C" happen to match the same previously stored data string, the resulting codeword would have the starting location to the match of "A" and a length of three.
The transmission capacity needed to send the codeword depends on the number of bits required to represent the length and location fields. As is well known in the art, the size of the location field is typically determined by either the current number of entries in the vocabulary table or by the maximum size of the vocabulary table. The size of the length field is typically chosen to vary according to a prefix code wherein more probable length values are uniquely encoded using fewer bits with respect to less probable length values. For example, the size of the codeword that represents the length-three string "ABC" may also be ten bits: seven bits to convey the location in the vocabulary table (that contains less than 128 locations) and three bits which encode the length of the match, say "101". The data compressor releases one 10-bit codeword representative of the entire character string for transmission and/or storage. One 10-bit codeword requires less space to store and less time to transmit as compared to three individual uncompressed data words (24-bits). Thus when string matches of length greater than one are found, the data compressor offers the feature of transmitting or storing fewer total bits to represent the same information as compared to uncompressed formats.
On the decompression side, the data decompressor receives the sequence of codewords from the data compressor by transmission link or from a mass storage device. The data decompressor begins to build its own vocabulary table from the incoming compressed data. Codewords beginning with "00" are taken as containing uncompressed data words which are provided directly to the end device and are added to the end of the decompressor vocabulary table. Other codewords containing location and length fields are converted to standard format by reading the designated string from the vocabulary table. These data words are further added to the end of the vocabulary table and sent to the end device.
The aforedescribed data compressor may include a content addressable memory (CAM) to hold its vocabulary table. Each CAM array memory cell is individually addressable with read/write capability. Each incoming data word is compared in parallel to the existing contents of the CAM array and is sequentially placed in the next available CAM array memory cell. Once the CAM array reaches capacity, the addressing wraps around to the beginning of the array, thereafter overwriting the contents of the oldest CAM array memory cell.
When the CAM array is initially power-on, or reset due to some error conditions, the CAM cell typically contains the reset character, e.g. the binary value "00000000", as described in U.S. Pat. No. 5,130,993. The CAM array may be initialized simultaneously during one or a few clock cycles, or may be loaded from a separate RAM in several cycles. The incoming data words begin filling the CAM array as described. A problem may occur when an incoming data word happens to be the same as the reset character. The CAM array will find a match in its unused portion and encode the data word accordingly. The data decompressor typically uses vocabulary RAM 36 to build its own local vocabulary table . Thus, unless the vocabulary RAM 36 is initialized to the same reset character as the CAM array, the decompression will yield an error when the codeword attempts to lookup the character in the unused and uninitialized portion of the decompressor vocabulary table.
One option is to allow matches to occur in the unused portion of the CAM array storing the compressor vocabulary table and to initialize the decompressor vocabulary table in the vocabulary RAM 36 to match the CAM array preset. However, when the vocabulary RAM 36 is external to the IC chip, it can typically be written only one addressable location at a time. Therefore, the initialization process of the decompressor vocabulary table in external RAM is slow compared to the CAM array preset.
A second option is to inhibit matches in the unused portion of the CAM array storing the compressor vocabulary table. The logic necessary to inhibit matches in the unused portion of the compressor vocabulary table complicates the design and increases the size of the CAM array. As the compression progresses and more of the CAM array is filled, such logic would need to adapt to inhibit a decreasing number of matches, which requires extensive circuitry.
Hence, a need exists to allow matches of incoming data words in the unused portion of the CAM array while ensuring proper decoding in the decompressor.