1. Field of the Invention
The present invention relates to video encoders and, more particularly, to a method and apparatus for selecting a coding mode (e.g., a frame coding mode or a field coding mode).
2. Description of the Background Art
The International Telecommunication Union (ITU) H.264 video coding standard is able to compress video much more efficiently than earlier video coding standards, such as ITU H.263, MPEG-2 (Moving Picture Experts Group), and MPEG-4. H.264 is also known as MPEG-4 Part 10 and Advanced Video Coding (AVC). H.264 exhibits a combination of new techniques and increased degrees of freedom in using existing techniques. Among the new techniques defined in H.264 are 4×4 and 8×8 integer transform (e.g., DCT-like integer transform), multi-frame prediction, context adaptive variable length coding (CAVLC), SI/SP frames, context-adaptive binary arithmetic coding (CABAC), and adaptive frame/field coding. The increased degrees of freedom come about by allowing multiple reference frames for prediction and many more tessellations of a 16×16 pixel macroblock (MB). These new tools and methods add to the coding efficiency at the cost of increased encoding and decoding complexity in terms of logic, memory, and number of operations. This complexity far surpasses those of H.263 and MPEG-4 and begs the need for efficient implementations.
The H.264 standard belongs to the hybrid motion-compensated DCT (MC-DCT) family of codecs. H.264 is able to generate an efficient representation of the source video by reducing temporal and spatial redundancies. Temporal redundancies are removed by a combination of motion estimation (ME) and motion compensation (MC). ME is the process of estimating the motion of a current frame in the source video from previously coded frame(s). This motion information is used to motion compensate the previously coded frame(s) to form a prediction for the current frame. The prediction is then subtracted from the original current frame to form a displaced frame difference (DFD). The motion information is present for each block of pixel data. In H.264, there are seven possible block sizes within a macroblock, e.g., 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 (also referred to as tessellations or partitions). Thus, a 16×16 pixel macroblock (MB) can be tessellated into the following partitions: (A) one 16×16 macroblock region; (B) two 16×8 tessellations; (C) two 8×16 tessellations; and (D) four 8×8 tessellations. Furthermore, each of the 8×8 tessellations can be decomposed into: (a) one 8×8 region; (b) two 8×4 regions; (c) two 4×8 regions; and (d) four 4×4 regions.
Furthermore, the motion vector for each block is unique and can point to different reference frames. The job of the encoder is to find the optimal way of breaking down a 16×16 macroblock into smaller blocks (along with the corresponding motion vectors) in order to maximize compression efficiency. This breaking down of the macroblock into a specific pattern is commonly referred to as “mode selection” or “mode decision.”
In addition, the H.264 standard allows for the adaptive switching between frame coding and field coding modes. Notably, this type of switching can occur at both the picture and the macroblock (MB) pair levels. However, present day processes are typically exhaustive in the sense that H.264 encoders encode a picture by completely executing both frame coding and field coding techniques and subsequently comparing the two end products to see which one performed better. Namely, each picture is encoded in its entirety twice. This approach is computationally expensive.
Accordingly, there exists a need in the art for a method and apparatus for an improved adaptive frame/field mode selection encoding method.