1. Technical Field
The present invention relates generally to methods and apparatuses for encoding and decoding video streams in a video compression system. More specifically, the present invention relates to a method and system for a video encoder and decoder, wherein the video encoder has low computational complexity and good compression efficiency, based on the principle of encoding each video frame using Slepian-Wolf information.
2. Background Art
Conventional video compression is based on the principle of differential pulse code modulation (DPCM). Specifically, a typical video signal is comprised of a sequence of images, termed frames, with temporally neighboring frames being highly correlated. Thus, efficient compression can be realized by differentially predicting a given frame with respect to previously encoded, temporally neighboring frames, and by encoding the prediction error. Compression is achieved since the prediction error typically has a much-reduced dynamic range compared to the original frame. Typical video compression is lossy, i.e., the decoded video frame is non-identical to the original video frame. To ensure that the encoding process is reversible (with some loss) at the decoder, a decoder is embedded within the encoder and differential prediction is constrained to be performed with respect to decoded images rather than original images.
FIG. 1 shows an exemplary conventional video compression system, based on the DPCM principle. The input video sequence 100 is input to a mode selector 101. The mode selector 101 partitions the current video frame into blocks of pixels and selects an encoding mode for each block, which defines the compression method to be used for that block. Examples of encoding modes include independent coding, differentially predicted coding, and skip coding. Given the mode selection for the current block, the selector 101 appropriately selects the compression means to be used for encoding the block. As an example, if the independent coding encoding mode is selected, the current block is input to the forward transform and quantization means 104, which applies a space-frequency transform to the block pixel values, and then discretizes the resulting transform coefficients. If, instead, the differential prediction-coding mode is selected, the current block is input to the motion estimator 103, which estimates the best differential predictor block with respect to previous frames stored in the frame buffer 112. The output of the motion estimator 103 is a set of motion vectors 113, which describes the location of the best differential predictor, as well as a block prediction error 114, which describes the difference between the current block and the differential predictor.
The prediction error 114 is input to the forward transform and quantizer 104, which applies the space-frequency transform to the signal and discretizes the resulting transform coefficients. The discretized transform coefficients and the motion vectors 113 (if present) are input to the entropy coder 105, which applies lossless compression to these. The entropy coder 105 outputs the compressed motion vectors 106 and the compressed transform coefficients 107 for each block. These constitute the compressed representation of the input video sequence 100. The output of the transform and quantizer means 104, and the motion vectors 113 are also input to the frame reconstructor 110, which reconstructs the decoded frame from these. The decoded frame 111 is identical to the output of a video decoder applied to the compressed representation consisting of signals 106 and 107. The decoded frame 111 is stored in the frame buffer 112 to be used for differential prediction of future video frames.
During DPCM video decoding, the compressed representation comprised of signals 120 and 121 is first input to the entropy decoder 122. The outputs of the entropy decoder 122 are the uncompressed motion vectors 123 and the uncompressed discretized transform coefficients 124. The discretized transform coefficients 124 are inverse quantized and converted to the pixel domain by the inverse transform and quantizer means 125. The resulting pixel values 126 represent the original block if the independent coding mode was used to encode the block, and represent the pixel prediction error if the differential prediction mode was used to encode the block. The pixel values 126 are input to the motion compensator 127, which also receives as inputs the motion vectors 123, and previously decoded frames used to generate the differential predictor (if any) from the frame buffer 129. The motion compensator 127 inverts the motion estimation process to generate the reconstructed block. The reconstructed video sequence 128 comprised of reconstructed video frames is the output of the DPCM video decoder. In addition, reconstructed video frames are stored in the frame buffer 129, to be used for motion compensation in future frames.
In the exemplary conventional DPCM video compression system shown in Example 1, the most computationally intensive operations are the motion estimation 103, typically followed by mode selection 101 and entropy coding 105. Further, motion estimation 103 is typically required to be performed for a majority of blocks, since differential prediction generally allows more compression than independent coding. Thus, in conventional video compression the computational complexity of the encoder is much larger than that of the decoder. This traditional paradigm is aimed at applications, such as video broadcasting, where encoding is required to be performed only once while decoding is performed a large number of times. Increasingly, however, video compression systems with computationally simple encoders are in demand in important emerging applications like video surveillance.
Previous methods to facilitate low-complexity video encoding fall in the following categories. The first class of methods employs a low-complexity mode selection process for each block. FIG. 2 shows an exemplary embodiment of this class of methods. A fast mode selector 201, which has low computational complexity, is used to select the encoding mode for each frame block. The remaining modules in the encoder (and decoder) are identical to the modules in FIG. 1. Examples of this class of solutions include the methods described in U.S. Patent Application Publication No. US 2006/0193385 A1 for “Fast mode-decision encoding for interframes”, and U.S. Patent Application Publication No. US 2004/0028127 A1 for “Method and apparatus for reducing computational complexity in video encoders”. The main shortcoming of these approaches is that reducing the complexity of mode selection alone does not typically reduce the complexity of video encoding significantly. This is because, as mentioned above, motion estimation has significant computational complexity.
The second class of methods to facilitate low-complexity video encoding seeks to reduce the complexity of motion estimation. This is done by either eliminating motion estimation altogether and only using independent coding (for example, Motion-JPEG), or by simplifying motion estimation by restricting the differential predictor search to a small subset of possible predictors. FIG. 3 shows an exemplary embodiment of this class of methods. A fast motion estimator 303, which has low computational complexity, is used to generate a differential predictor for each block. The remaining modules in the encoder (and decoder) are identical to the modules in FIG. 1. Examples of this class of solutions include the methods described in U.S. Pat. No. 7,177,359 for “Method and apparatus to encode a moving image with fixed computational complexity” and U.S. Patent Application Publication No. US 2005/0232360 A1 for “Motion estimation apparatus and method with optimal computational complexity”. The main limitation of these approaches is that using independent coding reduces compression efficiency, typically by a factor of two or more, while partial elimination of motion estimation typically does not reduce the complexity of video encoding to the extent required by applications such as surveillance.
Therefore, a need exists for an improved method for video compression wherein the encoder has low computational complexity and high compression efficiency.