The following pertains generally to digital data compression as it may be employed in many applications. Since it would be odious and redundant to present examples of both loss-less and lossy type compression across the entire range of potential applications, the examples presented herein relate primarily to lossy type compression applied to video type data. Video type data is especially challenging and of particular concern to industry presently, and lossy type compression techniques are well suited for compressing it and typically permit more notable efficiency increases. The choice of these as examples, however, should not be construed as implying limitations in the technical principles being discussed or in the scope of the present invention.
FIG. 1 (background art) is a block diagram showing the major elements of a typical end-to-end video system 10. The video system 10 includes a video encoder 12 and a video decoder 14, and it will also often have an optional intermediate channel 16 (e.g., for data storage or transmission). The video encoder 12 accepts a video sequence of raw video data 18 that includes a time indexed collection of raw frames 20 (also called images) to produce a compressed data bit stream 22. And conversely, the video decoder 14 accepts the data bit stream 22 and converts it back into a video sequence (now processed video data 24) that includes a collection of processed frames 26. The intermediate channel 16, if present, can efficiently store the data bit stream 22 or transmit it to another location. The term “channel” is used here as it widely used in electrical engineering, to denote a system that displaces with respect to time, space, or both. While important in many applications, the intermediate channel 16 is not especially relevant here and is therefore not discussed further.
The raw video data 18 and the processed video data 24, and similarly the raw frames 20 and the processed frames 26, are rarely ever the same. Since the raw video data 18 is typically associated with high bandwidth, a lossy type of compression is desirably employed to better facilitate handling of the data bit stream 22 and ultimately also of the processed video data 24. While lossy compression, as its name implies, loses some of the original information content of the raw video data 18, this is often an acceptable compromise because of one or more of the benefits that it can provide over loss-less type compression. For example, lossy compression usually results in the data bit stream 22 being much more compact, and it frequently also permits performing compression and/or decompression operations faster and with less processing resources.
In FIG. 1 the video encoder 12 of the video system 10 has four major stages: a prediction stage 28, a transformation stage 30, a quantization stage 32, and an entropy coding stage 34. The first two of these stages exploit inherent redundancy in the raw video data 18 to compactly represent it in the data bit stream 22. This works well in many applications because the raw video data 18 is frequently characterized by a high degree of correlation between the successive raw frames 20 that it contains, as well as between the neighboring data in each particular raw frame 20.
If each of the raw frames 20 in a sequence are viewed as being partitioned into grids of rectangular blocks of data (e.g., of size ranging from 4×4 to 16×16 pixels), a very simple block-motion model can then be applied wherein the blocks in a current frame can be viewed as arising from data in the previous raw frame that has been shifted in location. This usually offers a compact and reasonably accurate description (also called a predictor) of a video process.
The prediction stage 28 of the video encoder 12 therefore employs a shift component and a difference component. The shift component (also called the motion vector) represents a change in the location of a block from where it was located in the prior frame (if any) and the difference component represents a change in the information in the block now versus the information in the block as it existed in the previous frame (i.e., at its previous location there).
The transformation stage 30 takes the output of the prediction stage 28 and transforms it into the frequency domain to achieve more compaction. When the block-motion model provides a good description of a given set of raw video data 18, the corresponding residue information has small energy and corresponds to a low-frequency characteristic in the domain produced by the transform stage. As will be seen presently, this particularly effects how the following stages contribute to the efficiency of the video system 10.
The quantization stage 32 takes the output of the transformation stage 30 and applies a lossy compression to it, wherein individual transform coefficients are scaled down and truncated to the nearest integer. This lossy compression is usually a major contributor to the overall efficiency of the video system 10.
The entropy coding stage 34 takes the output of the quantization stage 32 and applies a loss-less compression to it, wherein quantization symbols are mapped into bits. Usually this entropy coding is implemented with a variable-length scheme such as Huffman coding.
The video decoder 14, in straightforward manner, employs the stages discussed above in reverse, reversing the actions performed by the video encoder 12 so that the compressed data bit stream 22 is converted back into a usable video sequence (i.e., the processed video data 24).
At their core, essentially all commercial grade video compression systems today employ these stages and techniques. For example, H.261, H.263, and H.264 (collectively H.26x) and MPEG-1, MPEG-2 to MPEG-4 (collectively MPEG-x) are current well-known standards that employ these and that are widely employed in video compression today. H.264 type video compression is used in the examples herein, although the following is applicable to compressing any bandwidth limited data (e.g., JPEG and other still image standards, or MP3 and other audio standards, to list just two well known examples of two common subject matter types).
The actual compression in a video system 10 takes place in the lossy compression quantization stage 32 and in the loss-less compression entropy coding stage 34, and these are now considered in more detail.
FIG. 2a-e are a series of depictions of a data block 40 undergoing processing by the quantization stage 32. H.264 standard type video compression is used in the example here, where processing is performed on 4×4 blocks.
FIG. 2a shows a hypothetical input to the quantization stage 32 of a 4×4 data block (raw block 42) that includes data called transform coefficients (since this “input” is output from the transformation stage 30).
FIG. 2b shows a low-frequency block 44 (termed such here for reasons discussed presently) that is an intermediate result in the processing of the raw block 42 of FIG. 2a. The individual coefficients now have been scaled down and truncated to the nearest integer. For example, say the value of a transform coefficient is 55 and the quantization scale being applied is 18. This transform coefficient is then quantized to a quantization level of 55/18=3. Alternately, a transform coefficient of 5 is quantized to 0.
Digressing briefly, it can be appreciated that this is a lossy operation, since in the video decoder 14 a quantization level of 3 will multiplied by the same quantization scale (18) giving a reconstruction value of 3*18=54 (not 55), and a quantization level of 0 will give a reconstruction value of 0*18=0 (not 5).
FIG. 2c illustrates the conventional linear zigzag forward scan order 46 used next in the quantization stage 32, and FIG. 2d shows the one-dimensional low-frequency array 48 (also termed such here for reasons discussed presently) that this produces.
It can be appreciated from FIGS. 2b-d that a large percentage of the transform coefficients in a data block 40 become quantization levels equal to zero for typical H.264 type video data. Furthermore, when the conventional forward scan order 46 is used, there is a very high likelihood that the resulting low-frequency array 48 will be characterized by initial non-zero values, followed by strings of zeros interspersed with occasional non-zero values (i.e., by values occurring predominantly above the diagonal 49 in FIG. 2b). This observation holds true for generic video data and forms the basis of the entropy coding mechanism used in most video compression systems today (e.g., H.26x and MPEG-x), as well as in many compression schemes used on other types of data.
It is common to think of such zigzag forward scanned coefficient data as a succession of (run, level, sign, last) quadruples where the run-part corresponds to the number of zeros before a non-zero value, the level-part corresponds to the magnitude of the non-zero value, the sign-part is a binary indicator of the sign of the non-zero value, and the last-part is a binary value that indicates whether the current (run, level, sign) triplet is the last one in the block.
FIG. 2e shows a quadruplet sequence of entropy coding symbols 50 that describes the low-frequency block 44 and the low-frequency array 48. The sixteen original transform coefficients are now efficiently represented by just four entropy coding symbols 50, which are the output of the quantization stage 32 and which become the input to the entropy coding stage 34.
Summarizing, FIGS. 2a-e depict what occurs in the quantization stage 32, from inputting raw blocks 42 of transform coefficients to outputting entropy coding symbols 50.
In the entropy coding stage 34 of a video encoder 12 these entropy coding symbols 50 are converted into the data bit stream 22 using variable length coding (VLC). For the sake of example we continue with the H.264 video compression standard, and particularly with the variant of the generic (run, level, sign, last) scheme it uses for 4×4 blocks.
For H.264 the data bit stream 22 will have VLC encoded values corresponding to:                a syntax element “coeff_token”;        the values of all non-zero quantization levels;        a syntax element “total_zeros”; and        syntax elements “run_before.”        
The syntax element “coeff_token” describes the number of non-zero coefficients in the 4×4 block (e.g., in the example in FIGS. 2a-e the quantity of non-zero coefficients is 4). The non-zero quantization level values along with the signs are presented in the data bit stream 22 in reverse order, with the last non-zero level indicated first and the first non-zero level indicated in the end (e.g., 1, 2, 3, and −7 in our example). The syntax element “total_zeros” describes the total number of zeros before the last non-zero level (e.g., 5 in our example). And the syntax elements “run_before” indicate the length of zero-runs before each non-zero level value. Just like the quantization levels, these are indicated in the reverse order with the zero-run before the last non-zero level indicated first followed by the zero-run before the second-last non-zero level and so on (e.g., 3, 2, 0 for our example; 3 for the run between the values 2 and 1, 2 for the run between values 3 and 2, and 0 for the run between values −7 and 3).
As mentioned above, in H.264 the data at the 4×4 block level is characterized by a large number of zeros occasionally interspersed with non-zero values. Furthermore, because of the low-frequency characteristic associated with typical video data, most of these non-zero values will occur at early positions in the zigzag forward scan order. Thus, the “total_zeros” syntax element, which counts the total number of zeros before the last non-zero level, is likely to take smaller values than larger values. Using the principles of Huffman coding, this bias in favor of smaller values as opposed to larger values, is exploited by assigning smaller sized codewords to smaller values and large-size codewords to larger values. FIG. 3 shows one such Huffman table for the syntax element “total_zeros” used in the H.264 standard.
Returning briefly again to FIG. 1, from the perspective of the video decoder 14, variable length decoding (VLD) is used to convert the data bit stream 22 back into (run, level, sign, last) quadruplets, which are then further converted to the linear zigzag scanned values, which are then converted to the two-dimensional blocks, and which are then de-quantized.
In summary, while different compression standards use different variants of this scheme to efficiently represent the sequence of quantized levels as bits, the above discussion covers the core principles.
Unfortunately, the data characteristics discussed above only hold true when the block-motion model offers a good description of the underlying video process, resulting in a residue with small energy and a low-frequency characteristic. This is what leads to non-zero coefficient levels occurring early on in the zigzag scan, and then being followed by zeros. However, there are many occasions where a natural video phenomenon will contain motion that is far more complex than can be captured by the block-motion model. For instance, common scene motions such as rotation and zoom are not described well by the block-motion model and the residue resulting from the block-motion predictor for such subject matter tends to have high energy and to be associated with a high-frequency characteristic. As a result, the syntax elements for the data bit stream that are tuned for the more commonly occurring low-frequency residue case instead offer a poor description of the residue information, resulting in higher bit-rates and poor compression.
FIGS. 4a-e are also a series of depictions of a data block 40 undergoing processing by the quantization stage 32, only here the data block 40 includes high-frequency data (i.e., values occurring predominantly below the diagonal 49 in FIG. 4b). For consistency, H.264 standard type video compression is again used in this example. FIG. 4a shows a hypothetical raw block 52 of high-frequency transform coefficients that is input to the quantization stage 32 (i.e., output from the transformation stage 30). FIG. 4b shows a high-frequency block 54 that is an intermediate result in the processing of the raw block 52 of FIG. 4a (with individual coefficients scaled down and truncated to the nearest integer). FIG. 4c illustrates application of the conventional linear zigzag forward scan order 46 to produce a one-dimensional high-frequency array 56 shown in FIG. 4d. And FIG. 4e shows a quadruplet sequence of entropy coding symbols 58 that describes the high-frequency block 54 and the high-frequency array 56.
For H.264 type compression the data bit stream 22 produced by the entropy coding stage 34 here will provide VLC encoded values wherein “coeff_token” is 4, the non-zero quantization level values along with the signs are −2, 3, 1, and 1 (since they appear in reverse order to that in the high-frequency array 56), the “total_zeros” value is 11, and the “run_before” values are 0, 0, and 0.
A key point to appreciate here is that the “total_zeros” value is markedly different for low-frequency versus high-frequency data (5 for the case in FIGS. 2a-e versus 11 for the case here in FIGS. 4a-e), and when VLC coding is performed on these based on the Huffman table of FIG. 3 the low-frequency “total_zeros” value produces a 5-bit codeword whereas the high-frequency “total_zeros” value produces an 8-bit codeword (i.e., the latter has 60% more bits than the former).
Generalizing now beyond H.264 and video to compression of all types of bandwidth limited data, when high-frequency data is encountered it is less efficiently handled and represented than low-frequency data. Among those few skilled in the art who have appreciated this previously, this has generally been reconciled as being too inconsequential to merit remedial efforts or as requiring such efforts that would be too burdensome to result in a net improvement. However, as discussed extensively below, it has been the present inventors' observation that such inefficiency is frequently consequential and it has been their labor to devise an elegant remedy for that.