1. Field of the Invention
The present invention relates in general to video encoding, and more specifically to a system and method of performing fast motion search for motion estimation in a video encoder which is particularly useful for low power consumption hand-held multimedia devices and/or low bitrate video communications.
2. Description of the Related Art
The Advanced Video Coding (AVC) standard, Part 10 of MPEG4 (Motion Picture Experts Group), otherwise known as H.264, includes advanced compression techniques that were developed to enable transmission of video signals at a lower bitrate or improved video quality at a given transmission rate. The newer standard outperforms video compression techniques of prior standards in order to support higher quality streaming video at lower bitrates and to enable internet-based video and wireless applications and the like. The standard does not define the CODEC (encoder/decoder pair) but instead defines the syntax of the encoded video bitstream along with a method of decoding the bitstream. Each video frame is subdivided and encoded at the macroblock (MB) level, where each MB is a 16×16 block of pixels. An MB can also refer to sub-blocks of data of different sizes, such as 16×8, 8×16, 8×8, 8×4, 4×8, 4×4, etc., without loss of generality. As used herein, the term “sub-block” is used to refer to an MB smaller than 16×16. Each MB is encoded in ‘intraframe’ mode in which a prediction MB is formed based on reconstructed MBs in the current frame, or ‘interframe’ mode in which a prediction MB is formed based on the MBs of the reference frames. The intraframe encoding mode applies spatial information within the current frame in which the prediction MB is formed from samples in the current frame. The interframe encoding mode utilizes temporal information from previous and/or future reference frames to estimate motion to form the prediction MB. In either case, a reference frame is used which has previously been encoded, decoded and reconstructed.
Motion estimation (ME) is critical for achieving good video compression in terms of rate-distortion (R-D) theory. The purpose of ME is to reduce temporal redundancy between frames of a video sequence. An ME algorithm predicts image data for an image frame using at least one reference frame in order to encode the image frame. A difference image is determined by taking the arithmetic difference between the original pixel data and the corresponding predicted pixel data. A full search motion estimation (FSME) is to perform motion search at each pixel location in a predefined search range within the reference frame/frames. Normally, a SAD (sum of absolute difference) value is used to evaluate the best match between the MB or sub-blocks to be encoded and the corresponding MB or sub-blocks in the reference frames. For a MB, the SAD calculation takes at least 16×16 absolute differences and an addition. If N pixels, e.g. 48×48, exist in the predefined search range, there will be N times of SAD calculations. In addition, for sub-block motion modes like 16×8, 8×16, 8×8, etc. the ME process is repeated. Thus, ME is a very CPU- and memory-bounded computationally intensive operation, however, and it has been difficult to find a simple and effective way to perform ME while maintaining acceptable rate-distortion. Furthermore, the value-adding multimedia applications (e.g., video communications) in wireless and mobile devices are limited due to power consumption and channel bandwidth constraints. The FSME method is implemented in the Reference CODEC (e.g. JVM10.0), where the SAD is calculated on a 4×4 block level and summed up for all seven motion modes (16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4). In order for the FSME to find the best motion vector among all motion modes, 16 SAD values are computed for each pixel or sub-pixel position in the search range within the reference frame, which is impractical in wireless or mobile applications because of the amount of computations. Second, for low-bitrate communication channels, the sub-block modes such as 8×8, 8×4, 4×8 and 4×4, are seldom used resulting in a waste of resources.
The complexity of the encoder is a bottleneck for multimedia applications using video, especially in wireless and mobile devices in which the computing and power resources and the transmission rate are limited. It is desired to streamline the interframe encoding process by increasing speed and reducing computations to conserve resources while providing acceptable encoding efficiency and visual quality.