1. Field of the Invention
Generally the invention is in the field of data and information coding for transmission and storage. More specifically it refers to lossless coding of video streams, lossless entropy coding, to achieve very high compression and throughput during video transmission and storage.
2. Prior Art
The aim of lossless coding is to represent a data stream using the smallest number of bits possible, without loss of any information contained in the data stream. Similarly the aim of lossless image coding is to represent an image signal using the least number of bits with no loss of information. This is a requirement for high-speed transmission and storage of data. In the case of a video stream this is represented as an average bits-per-second. The difference between lossy compression and lossless compression is that in the case of lossy compression the aim is to achieve a level of acceptable and predefined compression of the data stream with the best possible information content. Such a compression or coding will sustain some information loss that has to be evaluated for acceptance.
In all data streams it is possible to achieve a level of lossless compression. This is because there is redundancy present in the data streams. In the case of any image signal the redundancy can be significant. This redundancy is proportional to the correlation among the data samples. Typically between neighboring image samples, the difference will be small and the correlation very high. There is also a large correlation between adjacent frames of a video that can be used effectively to compress the digital stream.
Fully lossless coding of the decoded image has to be visually and numerically the same as the image that was coded. This requirement severely limits the possible compression ratio. In a video stream the aim is to provide a visually lossless video stream. Hence, removal of redundant data as well as removal of irrelevant data that does not impact the visual quality, leads to the achieving of higher compression. Even though lossy compression provides a much higher compression ratio, there are applications like medical imaging and others which require lossless transmission of information. In any application where the band width is fixed and compression needs are high, it is better to achieve a large lossless component of the final compression, if possible, and use a smaller range of lossy compression to improve the reliability of the transmitted data and image quality.
FIG. 1 shows a prior art lossless encoder 100 for image symbol coding. FIG. 2 shows a prior art decoder 200 that is used to regenerate the image from the encoded bit stream. The encoder 100 takes the input image 101 and transforms it in a transformation unit 102. The transformation unit 102 converts the input image 101 into a compressible stream of digital data. Typically the transformation unit 102 manipulates the image by reducing redundancy and enabling alteration of the statistical distribution for maximum compression. It also enables packing of information into few sub-regions or data samples. The aim of the transformation is to eliminate the inter-dependencies between the components of the input image or data. A typical encoder 100 will use one of differential predictive mapping (DPM), unitary transform such as discrete cosine transform (DCT), sub-band decompositions such as differential wavelet transform (DWT), or a combination of the abovementioned transform methods.
Within the encoder 100 a data-to-symbol mapping unit 103 is used to generate and map the digital data, generated by the transformation unit 102, into data symbols. Here the data is converted into a set of symbols that can be coded by the final coding stage 104. The symbol mapping unit 103 looks at the correlations between data blocks to improve the possible compression. Run length coding is one such mapping scheme commonly used. FIG. 3 shows a typical run length coding of data stream. It converts the input data 301 into a map of symbol pairs 302 of run and value. The value is the value of the data and the run is the number of times that value is repeated sequentially. Alternately coding schemes like JPEG use code values to code only the non-zero data value, and the run is used to code the number of zeros preceding each value. Most of these schemes depend on having a reasonably repetitive digital data to enable efficient coding and compression.
A lossless symbol encoding unit 104 then encodes these symbols to produce a bit stream 105 that is the encoded output. The encoding schemes for converting the data symbols to a symbol code stream commonly in use today include the well-known Huffman lossless symbol coding, the arithmetic lossless symbol coding, and the dictionary based symbol coding methods. These lossless coding schemes are typically referred to as entropy coding schemes.
The Huffman and the Arithmetic encoding schemes are based on understanding the statistical probability distribution of the data symbol set generated. The shorter codes are assigned to data symbols with higher probability of occurrence, that is, it is a variable length coding method. The alternate coding scheme, the dictionary based coding scheme, dynamically constructs the encoding and decoding tables of variable length symbols and uses them to generate code by lookup. In all these cases the code itself is variable and depends on the nature and repetitiveness of the data symbol set coded.
FIGS. 4a through 4d show a typical encoding using the Huffman coding scheme. A typical segment of the video data 400 is shown in FIG. 4a. The transformed data symbol set 401, which is 4 bit wide (N=4), is shown in FIG. 4a. Since there are four bits there can be as many as 16 individual symbols, though in the specific example shown not all the symbols are generated or used. The respective weights of the symbols in the data symbol set are compiled and combined as shown 402 also in FIG. 4a. The total weight is equal to the total occurrences and the ratio of occurrence of each element of data symbol set in the distribution. The construction 403 of the code tree with the code tree 404 is also shown in FIG. 4b. The symbols 404, generated from the data symbol set 401 and symbol codes (codes) 405, are shown in FIG. 4c. The resultant coded data symbol set 406 with the total bits to be sent is now shown in FIG. 4d. The sent information 407, in this case, requires the sending of symbols 402, the symbol code 405 and the coded symbol set 406 to enable extraction of the data set at the decoder. Since only 16 bytes are coded in the example, the repetitive nature is not substantial and the number of bits sent, 93 bits, is lager in the small example than the number of bits in the symbol set which is 64 bits.
The process of coding comprises the steps below:
1. The data set 400 is transformed into a data symbol set 401;
2. From the transformed data symbol set the individual symbols 402 and their weights are estimated as shown in FIG. 4a, where the total estimated weight of all symbols equals the total number of occurrences;
3. The symbols 402, are arranged in the order of weights, as shown in FIG. 4a, before combining the least occurring symbols;
4. A weighted structure 403 is built by connecting the least weights onward as shown in FIG. 4b, and repeating with other weights to achieve the final structure of the code tree 404;
5. From this code tree 404 the symbols codes 405 are extracted as indicated in FIG. 4c; 
6. These symbol codes 405 are used to code the data symbol set 401 as shown in FIG. 4d, thereby producing the transmittable coded data 406;
7. The symbol code 405, the symbols 402 and the coded data set 406 are transmitted 407 to the receiver for decoding where the inverse process is to be used to recover the datasymbol set;
8. The Huffman coding requires an additional termination code to separate “Symbols”, and “Symbol Code” due to variable length dependent upon number of codes generated; and,
9. The maximum number of codes generated is also variable depending on the inter-dependencies.
As defined in step seven above, the encoded output will be transmitted with the symbol code, symbols and coded data to the decoder 200. The decoder 200, once it receives the encoded bit stream 105 with the symbol code, symbols, and coded data, passes the bit stream through a lossless symbol decoding unit 202 to extract and decode the symbol. These symbols extracted are then passed to a symbol to data mapping unit 203 for mapping the symbol to data and regenerate the transformed data stream. This data stream is passed through an inverse transformation unit 204 to regenerate the input image 205 which is the same as the input image 100.
Due the inability of the transformation schemes like DCT and DWT to completely eliminate interdependencies from the transformed image or data stream, the full potential of compression is never achieved in entropy coding. It would be therefore valuable if a new method enabling the reduction of the interdependency, improving the run length and providing for a maximum number of fixed codes, be developed. It would be further advantageous if such scheme would improve the compression performance, i.e., provide better lossless compression. It will be further advantageous if such scheme would be compatible for use with the current prior art entropy coding schemes.