This invention relates to selective compression of digital video images.
Compression of digital video images using lossless schemes is a new area of research for video applications. Recent advances in digital electronics and electromechanics are helping to promote use of digital video images. The algorithms for lossy compression (or coding) of video images have become sophisticated, spurred by the applications and standardization activities for moving pictures or images, such as the ISO MPEG-1 and MPEG-2 standards and the ITU H.261/H.263 standards. The corresponding lossless compression approaches have received relatively little attention thus far, due to the higher computation requirements and to the generally lower compression efficiency for lossless compression of video image sequences.
Almost all known video coders have inherited some techniques from static or still image coders. For example, MPEG-1 and the H.261, the first moving picture standards to be developed, used techniques such as Huffman coding, discrete coding transform, run-length coding and other techniques that are similar to those developed for JPEG intra-frame coding of independent frames. The compression performance of lossless static image coders was not sufficient to form a basis for a lossless, video image sequence coder.
Recently, several lossless image coders have been proposed that have relatively good compression performance. The majority of these new techniques use sophisticated entropy-coding and statistical modeling of the source data in a pixel-by-pixel approach. These approaches are very cumbersome to implement and are much less efficient when encoded as software implemented on a digital signal processor (DSP) or on a general purpose microprocessor.
What is needed is a block-based video image compression approach that reduces the computational complexity but retains many of the attractive features of the most flexible video compression approaches. Preferably, the approach should allow selective uses of lossless compression and lossy compression for different portions of the same image in a video sequence, without substantially increasing the complexity that is present when only lossless compression or only lossy compression is applied to a video image. Preferably, the approach should allow use of pixel value information from an adjacent frame (preceding or following) to provide predicted pixel values for the present frame. Preferably, the approach should allow use of parallel processing to reduce the time required for, or to increase the throughput of, a conventional video coding approach.
These needs are met by the invention, which provides a block-based video image coder that permits multiple levels of parallel implementation. The pixels in each input block of an image in a frame of a video sequence are coded intra-frame using a differential pulse code modulation (DPCM) scheme that uses one of several selectable predictors. The invention uses a block-based, intra-frame image coding approach that provides lossless compression coding for an image in a single frame. This intra-frame coding approach is disclosed in a related patent application for a xe2x80x9cBlock-Based, Adaptive, Lossless Image Coderxe2x80x9d, U.S. Ser. No. 09/xxx,xxx.
The predictor for a block is chosen using local characteristics of the block to be coded. Prediction residuals (difference between actual and predicted values) are mapped to a non-negative integer scale and are coded using a new entropy-coded mechanism based on a modified Golomb Code (MGC). In addition, a novel run-length encoding scheme is used to encode specific patterns of zero runs. The invention permits parallel processing of data blocks and allows flexibility in ordering the blocks to be processed.
Some inter-frame prediction is also available so that frame-to-frame changes can be accounted for. The invention uses a motion vector to relate, by a linear transformation (translation, rotation, scale factor change, etc.), a pixel value (or block thereof) in an adjacent frame to a pixel value (or block thereof) in the present frame.
A block of pixels is examined to determine if each of the pixel values for the present frame is the same as the value of the corresponding pixel for the preceding frame. If so, this block of pixel values for the present is predictable by the pixel values already received for the same block for the preceding frame. If not, the system determines if all pixels in this block have one value (dc-only block). If so, a dc-only block uses a selected predictor and is easily compressed for later use. A non-dc block is examined according to selected criteria, and an optimal predictor is selected for this block.
This predictor includes an intra-frame component and an inter-frame component. A residual value (actual value minus predicted value) is computed and clamped, and the block of clamped values and corresponding predictor index are processed for compression, using an efficient mapping that takes advantage of the full dynamic range of the clamped residual values.
Context modeling can be included here without substantially increasing the computational complexity, by making the context switch granularity depend upon a xe2x80x9cblockxe2x80x9d of pixels (e.g., Pxc3x97Q), rather than on a single pixel, to allow inclusion of a transition region where a switch occurs. In some imaging applications, combinations of lossless and lossy techniques are combined to compress an image. For example, a portion of the image corresponding to a majority of text information might have to be losslessly coded, while the portion of the image with continuous-tone gray-scale information can be coded with some visual distortion to obtain higher compression. In such applications, the input image is segmented to identify the regions to be losslessly coded. Accordingly, lossy coders and lossless coders are switched on and off region-by-region within a frame. However, many of the lossy and lossless coders may work only on an entire frame. The xe2x80x9cchunkingxe2x80x9d by the segmentation algorithm makes it inefficient to code small blocks using the existing methods.
For video image encoding, use of a block-based coding scheme offers an additional advantage where inter-frame image processing is important. It is common practice in video image sequence encoding to segment a frame into rectangular regions and to determine if a close match exists between a given block in the present frame and a prediction block that was estimated using previously reconstructed data from an adjacent frame. When a close match occurs between actual and predicted pixel values for such a rectangular region, a vector is transmitted to the coder as a portion of the bitstream, specifying the addresses of this region in the reference frame that were used to compute the predicted pixel values. This approach, commonly referred to as xe2x80x9cmotion estimationxe2x80x9d in the video literature, achieves good video compression by utilizing pixel value correlation between adjacent frames. Although it may be possible to modify a pixel-based scheme to utilize such a correlation, this modified scheme will not be computationally efficient, and additional bits will be needed to transmit the motion data for each pixel.
Using an inter-frame processing technique, a block of pixel values in a present frame is compared with a corresponding block of predicted pixel values in an adjacent or reference frame (preceding or following); and if a match occurs, the corresponding block of predicted pixel values is used to predict the pixel values of the block from the present frame; this may be implemented by use of a single bit or flag, indicating that such a match occurs, and the intra-frame coding component can be skipped for this frame. If no match occurs, a combination of an inter-frame coding component and an intra-frame coding component is used for compression coding.
This approach also allows use of parallel processing of two or more blocks of pixel values, within a single frame or within adjacent frames, in order to reduce the time required for processing a frame, or to increase the throughput of processing pixel values for one or more frames.
The approach disclosed here is applicable to natural, synthetic, graphic and computer-generated image sequences that may change from one frame to the next. A context switch at the block level scan be adapted for lossy coding. Thus, one obtains a single coder format that fits both lossy and lossless cases and encompasses a video image segmenter as well.