Portions of this document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent document or the patent application, as it appears in the Patent and Trademark Office file or records, but otherwise reserves all rights whatsoever.
1. Field of the Invention
The present invention relates primarily to the field of data compression, and in particular, to a method for coding at least one designated frequently occurring value.
2. Background of the Invention
Computer systems are increasingly being required to call up, process, and display data, especially multimedia (audio and video) data. However, many computer systems are unable to transfer data quickly and efficiently. This is particularly true for video data. Consequently, the transfer of video data from a multimedia file in storage can be slow, inefficient, unreliable and often inadequate for acceptably immediate and continuous playback.
One reason the transfer of video data is particularly problematic is that video data processing is very memory intensive; i.e., video data requires large amounts of memory for storage and use by a computer system.
Since delay in the transfer of data is directly proportional to the amount of data to be transferred, another solution to the problem of transmitting a large amount of data is to compress the data for transmission and decompress it back at the destination. Some examples of prior art compression schemes are Motion JPEG, MPEG-4 and QuickTime. Many of these prior art coding schemes employ variable length coding.
Variable length coding is a method of compressing data that produces a unique bit sequence, or code, for each entity of text, audio, video, or graphics data in a file. Variable length encoding methods achieve compression by mapping the data (or data sequences) to codes whose average length is less than that of the original representation of the data. Consequently, there is a significant savings of memory space used to store the data.
Using variable length coding, the length of a code is typically based on the probability of the occurrence f the particular data represented by the code, with higher probability data typically having shorter length codes and lower probability data typically having longer length codes. This embodiment of variable length coding is sometimes called entropy coding.
Some prior art coding schemes are not variable but are instead xe2x80x9cfixed lengthxe2x80x9d schemes. To show the advantages and operation of variable length coding, as opposed to fixed length coding, assume, as a simplified example, that 10,000 characters in a file are a combination of only five unique characters, xe2x80x9caxe2x80x9d, xe2x80x9cbxe2x80x9d, xe2x80x9ccxe2x80x9d, xe2x80x9cdxe2x80x9d and xe2x80x9cexe2x80x9d. Further assume that the five unique characters occur with the following frequency: xe2x80x9caxe2x80x9d occurs most frequently at 4100 times; xe2x80x9cbxe2x80x9d occurs 2600 times; xe2x80x9ccxe2x80x9d occurs 1700 times; xe2x80x9cdxe2x80x9d occurs 1200 times and xe2x80x9cexe2x80x9d occurs least frequently at 400 times.
An exemplary fixed length coding scheme for the example above is shown in Code Mapping Table 1. An exemplary variable length coding scheme, in this example a Huffman coding scheme, could be implemented as also shown in Code Mapping Table 1.
As seen in Code Mapping Table 1, a typical fixed length coding scheme assigns each character a unique fixed bit code, in this example a fixed, three bit code. Consequently, the 10,000 characters of the present example would require 30,000 bits to encode using the fixed length coding scheme of the Code Mapping Table 1.
As also seen in Code Mapping Table 1, a typical variable length coding scheme assigns each of the five unique characters a unique variable length code with frequently occurring characters, such as xe2x80x9caxe2x80x9d, assigned short codes and infrequently occurring characters, such as xe2x80x9cexe2x80x9d, assigned longer codes. In particular, xe2x80x9caxe2x80x9d has been encoded with a single bit since it is the most frequently occurring character (4,100 occurrences). As discussed in more detail below, the most frequently occurring character is often referred to herein as the Most Probable Value (MPV). The next most frequently occurring characters, xe2x80x9cbxe2x80x9d and xe2x80x9ccxe2x80x9d, are represented by two bits and three bits, respectively, and xe2x80x9cdxe2x80x9d and xe2x80x9cexe2x80x9d are represented by four bits since they are the least frequently occurring characters.
Using variable length coding, the decoding process involves receiving only a first bit and then checking this first bit with each code entry in Code Mapping Table 1 to see if the first bit comprises an allowed code. If the first bit does comprise an allowed code, the mapping is completed and the decoding process is over. However, if the first bit does not comprise an allowed code, a second bit is received, and the two bits in combination are checked with each code entry in Code Mapping Table 1 to determine if the combination of bits comprises an allowed code. This process is repeated for each new bit, with the combination of each new bit and all previously received bits being checked, with each code entry in Code Mapping Table 1 to determine if the combination of bits comprises an allowed code, until an allowed code is detected.
In the example above, using prior art variable length coding, 20,800 bits (4100*1+2600*2+1700*3+1200*4+400*4) were used to encode the entire file. Recall that using fixed length coding 30,000 bits were used. Consequently, despite its cumbersome nature, prior art variable length coding provides a significant savings of approximately 31.7% over fixed length coding. This is a significant improvement to say the least. However, there are several significant drawbacks to prior art variable length encoding schemes.
One significant drawback of variable length coding schemes is the cumbersome and time consuming nature of the decoding process. As discussed with respect to the example provided above, decoding involves receiving new bits one at a time and then, with each new bit received, checking the ensemble of received bits with each code entry of the code mapping table until a unique allowed code is received. Consequently, even for our simplified example, including only five unique characters, the variable length decoding process is long and cumbersome.
To make matters worse, in real systems the number of possible unique characters is typically much larger than our assumed five from above. Consequently, the code mapping table used, such as Code Mapping Table 1 discussed above, can become prohibitively large as the number of unique characters increases. Further, the average length of the codewords is also larger. As explained above, since each bit has to be checked with all the code entries in the code mapping table until a unique allowed code is received, this can become a significant problem. Once again, this is clearly a time consuming, inefficient and cumbersome process.
Other types of variable length coding schemes use variable-bit codes that start with a unary-encoded number (i.e. a sequence of zeroes where the value encoded is the number of zeroes). Golomb, Golomb-Rice, and adaptive prefix coding are all examples of variable-bit codes that start with a unary-encoded number. As discussed in more detail below, for certain implementations that employ a temporary buffer to assist in the extraction of bit sequences from a whole-byte data read in from the input device, the decoder can count the zeroes with less work than actually extracting each zero bit one at a time. For these variable bit codes, the number of zeroes can be used to deduce exactly how many bits you should read to get the next codeword. This provides a somewhat more efficient process than the Huffman type variable length decoding process described above, however, the process is still cumbersome and inefficient.
The problems discussed above with respect to prior art coding schemes were particularly pronounced since, using prior art schemes, each and every character was typically subjected to the coding and decoding process. As a result, many prior art coding schemes employing variable length coding were not well suited to environments where there was little processing power available, e.g., thin clients. Since a current emphasis in the electronics industry is on portable devices and thin clients, the inability of prior art coding schemes to operate efficiently in this environment was a significant handicap.
What is needed is a method of decreasing the number of times the prior art coding schemes are invoked so that time and processing energy are conserved while, at the same time, the benefits of prior art coding schemes, such as variable length coding, are still incurred.
The present invention provides a method for significantly decreasing the number of times prior art coding schemes, such as variable length coding, are implemented in the course of encoding/decoding a given data block. One embodiment of the invention involves a method of coding frequently occurring values, such as the Most Probable Value (MPV), that includes cataloging all the occurrences, or locations, of the MPV in a data segment, or block, of the data stream. According to one embodiment of the invention, the locations containing the MPV, i.e., the MPV locations, are denoted with a first bitcode value, such as xe2x80x9c1xe2x80x9d and the locations containing values other than the MPV, i.e., the non-MPV locations, are denoted with a second bitcode value, such as xe2x80x9c0xe2x80x9d. According to one embodiment of the invention, the bitcodes for the MPV and non-MPV locations are then combined into a bitcode sequence and the bitcode sequence is included in an encoded block of data, herein also referred to as an E-data block, which is created according to the method of the invention.
In another embodiment of the invention, methods of identifying and denoting the MPV locations other than row bitcode sequences can be utilized with the invention including, but not limited to, run-length encoding or various other encoding methods known in the art.
According to one embodiment of the invention, the values in the non-MPV locations are then encoded by any one of several prior art coding schemes well known in the field such as Huffman coding. According to one embodiment of the invention, these coded non-MPV values are then included in the E-data block as well.
According to the invention, the decoding process is simply the reverse of the coding process. When decoding, the bitcode sequence is used to determine the locations of the MPV and the MPV value is inserted at the specified locations, without invoking any of the decoding routines of the prior art coding schemes. Then, according to one embodiment of the invention, the prior art coding scheme is used to decode the encoded non-MPV values and these non-MPV values are inserted in the non-MPV locations. Consequently, according to the invention, the cumbersome prior art decoding processes are only used on the non-MPV values.
Recall that, as discussed above, prior art coding schemes, such as variable length coding, are cumbersome and time consuming processes and that the problems associated with prior art coding schemes, such as variable length coding, were particularly pronounced because, using prior art methods, each and every value was typically subjected to the coding and decoding process.
In contrast, using the method of the invention, the MPV, or any designated frequently occurring value, is left out of the prior art coding process. Consequently, according to the invention, fewer values need to be encoded and decoded using prior art coding schemes and therefore these cumbersome processes and routines are used less frequently. The variable length decoding process including receiving new bits, one at a time, then, with each new bit received, checking the ensemble of received bits with each code entry of the code mapping table until an allowed code is received, is not invoked for the most frequently occurring values. Therefore, the cumbersome variable length decoding process is frequently avoided and the processing of the data is significantly sped up and simplified.
Further, the MPV can be excluded from representation in the prior art coding scheme, resulting in shorter code words for the non-MPV values. Thus, the bit by bit decoding process is shortened even for non-MPV values.
In one embodiment of the invention, the MPV locations are specified before sending the variable length codes for the non-MPVs. This is done to maximize decoding efficiency. In this one embodiment of the invention, the non-MPVs are only then encoded and the MPV locations are excluded from the encoding process.
In one embodiment of the invention, bits representing the locations of the MPVs, i.e., the bits in the bitcode sequence, are written or read in a single group of bits, during encoding or decoding of the data, instead of in a sequence of separate writes or reads. In contrast, in the prior art, a separate variable bit read or write was required for each occurrence of the MPV. This was typically very costly since reads from, and writes to, physical devices must be made in multiples of 8 bits. Consequently, to perform I/O of a non-integer number of bytes, additional processing is required.
For instance, consider the input case. First, an integral number M of bytes is typically read into a temporary buffer. The desired bits must be extracted using bit shift ( greater than  greater than ) and bitwise AND operations (and) When all of the bits in the temporary buffer have been used, a new set of M bytes is read into the temporary buffer. For fixed-width I/O (including the 1-bit at a time used when reading in Huffman codes), one knows before-hand when the buffer will be exhausted. However, for variable-width reads such as used in some implementations of adaptive prefix coding, Colomb coding and Colomb-Rice coding, the decoder must check after every read whether the buffer is empty or not.
The output case is similar: one uses bit shift ( less than  less than ) and OR (|) operations to combine the bit sequences into an integral number of M bytes, then writes the M bytes to the output. Again variable-width writes require checking to see if the buffer is full after every variable-bit sequence is added.
Since, according to one embodiment of the invention, the bits in the bitcode sequence are written or read in a single group of bits, during encoding or decoding of the data, the method of the invention avoids the costly process described above. Therefore, the method of the invention significantly reduces the overhead involved in invoking the encoder or decoder a routine and the bit I/O.
One embodiment of the invention works particularly well with the prior art technique of creating and sending a row bit mask to represent any rows of data, in a matrix of data, where every location in the row holds the MPV. In this case, according to the invention, the row bit mask for the entire matrix is also included in the resulting E-data block. In this embodiment of the invention, the bit code sequences and non-MPV codes are sent for only those rows with one or more non-MPV values. This further reduces the overhead involved in invoking the encoder or decoder routines of the prior art coding schemes.
The present invention can be used in conjunction with any prior art coding schemes, including but not limited to: Huffman coding, Golomb coding and variable length coding using adaptive prefix codes. When the present invention is employed with one of these prior art coding schemes, or with any other coding scheme, the present invention provides an encoding and decoding process that is extremely easy to perform and is used efficiently and effectively by any computing system. The present invention is also particularly advantageous when used by a thin client system that does not have high processing power or extensive memory.
It is to be understood that both the foregoing general description and following detailed description are intended only to exemplify and explain the invention as claimed.