1. Field of the Invention
The present invention relates to video encoders and, more particularly, to a method and apparatus for selecting a coding mode.
2. Description of the Background Art
The International Telecommunication Union (ITU) H.264 video coding standard is able to compress video much more efficiently than earlier video coding standards, such as ITU H.263, MPEG-2 (Moving Picture Experts Group), and MPEG-4. H.264 is also known as MPEG-4 Part 10 and Advanced Video Coding (AVC). H.264 exhibits a combination of new techniques and increased degrees of freedom in using existing techniques. Among the new techniques defined in H.264 are 4×4 discrete cosine transform (DCT), multi-frame prediction, context adaptive variable length coding (CAVLC), SI/SP frames, and context-adaptive binary arithmetic coding (CABAC). The increased degrees of freedom come about by allowing multiple reference frames for prediction and many more tessellations of a 16×16 pixel macroblock (MB). These new tools and methods add to the coding efficiency at the cost of increased encoding and decoding complexity in terms of logic, memory, and number of computational cycles.
The H.264 standard belongs to the hybrid motion-compensated DCT (MC-DCT) family of codecs. H.264 is able to generate an efficient representation of the source video by reducing temporal and spatial redundancies. Temporal redundancies are removed by a combination of motion estimation (ME) and motion compensation (MC). ME is the process of estimating the motion of a current frame from previously coded frame(s). This motion information is used to motion compensate the previously coded frame(s) to form a prediction. The prediction is then subtracted from the original frame to form a displaced frame difference (DFD) or more broadly, an error signal. The motion information can be determined for each block of pixel data. However, in H.264, there are seven possible block sizes within a macroblock: 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 (also referred to as tessellations, partitions or blocks). Thus, a 16×16 pixel macroblock (MB) can be tessellated into the following sub-MB partitions: (A) one 16×16 block; (B) two 16×8 blocks; (C) two 8×16 blocks; and (D) four 8×8 blocks. Furthermore, each of the 8×8 blocks can be decomposed into: (a) one 8×8 block; (b) two 8×4 blocks; (c) two 4×8 blocks; and (d) four 4×4 blocks. Thus, there are many possible tessellations for a single macroblock.
Furthermore, for each partition block type, there are many possible prediction directions, thereby providing up to hundreds of possible partition patterns for each macroblock. One function of the encoder is to determine an optimal way of encoding a macroblock which requires selecting one of these numerous possible partition patterns for a macroblock. This selection is commonly referred to as “mode selection” or “mode decision.”
Certainly, a mode selection method may simply attempt to find the best coding mode with the best performance by executing each and every possible partition pattern for a macroblock. However, this exhaustive approach is computationally very expensive and is very time consuming. Thus, the exhaustive approach may not be practical for real time applications.