Video compression is used in applications which require transmission and/or storage of video information. Video information is organized in layers having groups of frames. Each group of frames include a series of single frames, whereby each single frame has a still image. When processed, the single frames are processed to run together and form a continuous motion of images, thereby simulating a motion frame in digital format. Each image frame in the series of frames is broken up into several macroblocks, whereby each macroblock includes several blocks. Each block within a macroblock includes several pixels, whereby the number of pixels in each block depends on the type of video compression format as well as the resolution being used.
H.264 is the state-of-the-art video compression format that is rapidly gaining acceptance in various multimedia markets. H.264 is similar to the MPEG (Moving Frames Experts Group) standard, except H.264 is made of a 4×4 pixel coding unit.
The H.264 format is believed to have a much higher compression performance than its predecessors as well as the codec of choice in terms of flexibility. However, the exceptional compression performance attributed to the H.264 comes at the cost of increased computational complexity, especially at the encoder module in a video compression system. The flexibility provided by H.264, in terms of the number of reference frames that can be used, makes the search for best motion vector methods very complex. For instance, the JM reference code versions perform an exhaustive search for different reference frames in the H.264 format. This search involves very high computational complexity which can make the H.264 standard difficult to utilize in real-time applications. To reduce this complexity, a variety of fast algorithms have been proposed for a more efficient motion vector search.
These proposed algorithms reduce the complexity of choosing between different reference frames by using block level decision-making process. For instance, the reference frames for each block are predicted by analyzing the reference frames of the neighboring blocks or the “sub-pixel locations” of the collocated blocks in all the reference frames.
One known method of choosing the best reference frame for each block reduces the number of reference frames that are evaluated during the compression. The number of reference frames which are to be evaluated is determined based on the farthest reference frame that is chosen by looking at the neighboring blocks along with a pre-chosen addition factor. This method is subject to the condition that it does not exceed the maximum number of reference frames specified in the parameter sets.
Another known method involves selecting the reference frame for each block. For every block, a “sub-pixel location” is determined based on the sub-pixel location of the reference block in the immediate reference frame and the sub-pixel motion vector with respect to the said immediate reference frame. Early termination checks can be applied to motion estimation process. For instance, if a Sum of Absolute Differences (SAD) calculation between the block and the immediate reference frame meets a desired value or if the region is “flat”, no further search is performed with respect to the other reference frames. If these early exit conditions are not satisfied, motion estimation is performed on all of the other reference frames. The method analyzes the sub-pixel locations of the blocks collocated to the current block in different reference frames, whereby only those reference frames in which these sub-pixel locations are distinct are chosen for motion vector search. If the collocated blocks in two reference frames have the same sub-pixel location, the one closer to the current frame is chosen. This reduces the number of reference frames that are considered in the search.
Though the prior art methods achieve reduction in computational complexity, they do leave a lot of scope for further reduction. For example, in the method where the number of reference frames for the current block is determined based on the best reference frames chosen for the neighboring blocks and a pre-chosen addition factor, the addition factor should at least be a value of 1 for the mechanism to adapt well. This would mean that even for frames which do not require more than one reference frame, at least one additional frame would be used. Similarly, within a frame, some blocks might require only one reference picture to be considered, while the others may need more. In that case, the minimum number of reference pictures is bounded by the pre-chosen addition factor that is used which leads to sub-optimality in complexity reduction.
The method that employs the sub-pixel positions for reducing the complexity, depends heavily on the motion vector search mechanism. The performance of the method when the motion vector search itself has sub-optimalities as in the case of real-time applications can reduce drastically. The method also requires considerable additional storage for storing the sub-pixel locations of all the pixels.
What is needed is a system and method which dynamically and efficiently determines the number of reference frames needed to efficiently encode the image frame.