The present invention relates to digital video encoding, but more specifically to a method and an apparatus to determine in real time an appropriate prediction mode to be implemented for successive blocks of video information.
Digital video encoding entails converting a video stream representing a sequence of frames into a compressed format for efficient storage or transmission while incurring an insignificant loss in video quality. In conventional compression methods, frames are segmented into macroblocks of n×n pixels (n is typically 16) so that the compression algorithm may compress the macroblocks individually or with reference to one or more previously encoded macroblocks. A typical encoder uses previously encoded blocks to derive a prediction value for a current block or macroblock. As such, a difference signal generated by subtracting the prediction value from a current macroblock is encoded using known compression techniques, such as variable length or arithmetic coding.
Current techniques seek to determine an optimum encoding mode among many to generate a predictor that yields a desired performance. The best or optimum encoding mode is then chosen to generate the predictor. For a given macroblock, however, not every mode choice yields acceptable compression performance. Thus, an effective implementation of a video coder also requires searching through “mode space” that includes many prediction modes and then coding the video block using each encoding mode in order to find the best compression algorithm.
A naïve but time-consuming scheme to search mode space involves generating a plurality of possible predictors, using each predictor to generate corresponding difference signals, encoding the video information with the difference signals, and then choosing the mode or algorithm that yields the best trade-off between image quality and compression ratio. Given the hardware available to implement present day video coding standards, such a scheme would be prohibitively complex and difficult to achieve in real time.
Practical encoders reduce mode search complexity by (a) computing an approximation of rather than an actual prediction error, (b) selecting a coding mode based on a function of the prediction error rather than compression performance, and/or (c) computing full compression performance only for a sub-set of modes and then using a function of the prediction error to differentiate between the rest. Implementations specified by, for example, the advanced video coder (AVC) of the Joint Video Team (JVT) and MPEG-1/2/4 use prediction error approximations to determine an appropriate prediction mode.
FIG. 1 shows a system block diagram of a JVT-AVC encoder 10 that selects a particular encoding block size from a plurality of choices to encode a digital video. The JVT-AVC standard, i.e., H.264/AVC, uses a mode space selection scheme that includes choosing to generate the prediction error for a macroblock as a single unit (i.e., a 16×16 prediction mode class) or to generate the prediction errors of smaller 4×4 sub-blocks (i.e., a 4×4 prediction mode class). Each mode class under the standard comprises several prediction modes. There are four choices of prediction modes in the 16×16 prediction mode class and nine choices of prediction modes in the 4×4 prediction mode class. Herein, the term “I16 mode” to refers to the four 16×16 modes and the term “I4 mode” refers to the nine 4×4 modes.
In the JVT-AVC mode selector of FIG. 1, macroblock extractor 12 extracts blocks of pixel information from a digital video input stream to generate either an I4 or I16 prediction error via error generators 14 or 16. Mode selector 18 determines a best prediction mode to implement based on prediction errors generated for prior blocks, as reflected in buffer 20 that receives feedback from an output stage of coder 22. Buffer 20 thus provides a source of encoded blocks from which extractor 24 may extract previously encoded macroblocks prior to forwarding the blocks to generators 14 and 16. Based on the forwarded information, generators 14 and 16 determine a prediction error for the current block by comparing or subtracting a current macroblock from a reconstructed prior macroblock.
Known implementations of selector 18 to choose a mode class include (a) generating an approximate or full evaluation of each mode class (i.e., for both the 16×16 and 4×4 mode classes); (b) selecting the best mode in each mode class; and (c) then selecting the best choice between the two mode classes.
FIGS. 2 and 3 respectively illustrate the operation of the I4 Mode Prediction Error Generator 14 and the operation of the I16 Mode Prediction Error Generator 16 shown in FIG. 1. In each case, prediction modes in each mode class are evaluated to some degree, which adds computational complexity to the mode selection process. As shown in FIG. 2, extractor 30 of the I4 mode selector extracts a 4×4 sub-block to generate a prediction via predictor generators 32, 36 and prediction error generators 34, 38. Generators 34 and 38 produce a prediction error for a current sub-block based on sub-blocks that have been reconstructed from previously encoded sub-blocks. A submode selector 40 determines which I4 prediction error in memory block 42 to output from the coder. The output is also fed back to predictor generators 32 and 36 via the 4×4 block coder 44 and decoder 46.
The I16 mode selector 16 of FIG. 3 operates similarly where predictors from previously encoded macroblocks produced by a series of generators (generators 42 and 44 shown) are supplied to prediction error generators (generators 46 and 48 shown) to produce a prediction error for a current macroblock. Based on the prediction error generated by the generators 46, 48, submode selector 50 selects which 116 prediction error to output from memory block 52.
In all such cases, though, some or all of the prediction modes in each I4 and I16 mode class must be evaluated to some degree, which evaluation requires substantial processing time particularly since the evaluation must be carried out separately for each mode class.
Thus, it would be advantageous to reduce the computational complexity of the prior art mode selection process by reducing or eliminating the need to generate, approximate or otherwise evaluate one or more prediction errors during mode class selection.
Using heuristics directed to certain properties of the video steam, the present invention advantageously eliminates or reduces searching mode space of at least one of the mode classes. Because macroblocks having unique statistical behavior fall into different mode classes, one can examine statistical information alone or with other attributes of the video to “predict” which mode class will be better suited for compression of a current macroblock. For example, 16×16 prediction modes are better suited to compress macroblocks with little or smooth variation in pixel intensities whereas the 4×4 prediction modes are better suited to compress macroblocks with larger variations in pixel intensities. Thus, a mode class may be chosen simply by assessing variations in pixel intensity. Other statistical information may be used to determine an optimal mode class. Moreover, statistical learning may be applied to differentiate between mode classes without any recourse to computation of prediction error. Under the JVT-AVC standard, for example, the method and apparatus embodiments of the present invention was found to yield negligible loss in performance while greatly reducing the computational complexity, and hence, processor cycles, by a factor of two.