To address the diverse development of digital video technology, various standards are established to provide standard coding/decoding strategies with sufficient flexibility to accommodate a plurality of different applications and services, such as desktop video publishing, video conferencing, digital storage media and television broadcast. These standards include, but are not limited to, the Moving Picture Experts Group Standards (e.g., MPEG-1 (11172-*) and MPEG-2 (13818-*), H.261 and H.263. As such, the present invention is described below using an MPEG encoder, but it should be understood that the present invention can be employed in encoders that are in compliant with other coding standards.
Although the MPEG standards specify a general coding methodology and syntax for generating a MPEG compliant bitstream, many variations are permitted in the values assigned to many of the parameters, thereby supporting a broad range of applications and interoperability. In effect, MPEG does not define a specific method needed to produce a valid bitstream. Furthermore, MPEG encoder designers are accorded great flexibility in developing and implementing their own MPEG-specific methods in areas such as image pre-processing, motion estimation, coding mode decisions, scalability, and rate control. This flexibility fosters development and implementation of different MPEG-specific methods, thereby resulting in product differentiation in the marketplace. However, a common goal of MPEG encoder designers is to minimize subjective distortion for a prescribed bit rate and operating delay constraint.
More specifically, in the area of coding mode decision, MPEG provides a plurality of different macroblock coding modes. Mode decision is the process of deciding among the various coding modes made available within the confines of the syntax of the respective video encoders. Generally, these coding modes are grouped into two broad classifications, inter mode coding and intra mode coding. Intra mode coding involves the coding of a macroblock or picture that uses information only from that macroblock or picture. Conversely, inter mode coding involves the coding of a macroblock or picture that uses information both from itself and from macroblocks and pictures occurring at different times. For example, MPEG-2 provides macroblock coding modes which include intra mode, no motion compensation mode (No MC), skipping, frame/field/dual-prime motion compensation inter modes, forward/backward/average inter modes and field/frame DCT modes. For a detailed description of each coding mode, see the ISO/IEC international Standards for MPEG-1 and MPEG-2.
These coding modes provide different coding strategies (predictions) which produce different efficiencies in the number of bits necessary to code a macroblock. Thus, each mode can be more efficient than another depending upon a number of different factors such as the coarseness of the quantization scale, picture type, and nature of the signal within the macroblock. To achieve optimal coding performance, it is necessary to select the most efficient coding mode by calculating and comparing the number of bits necessary to code a particular macroblock at a given distortion for each separate mode. The most efficient coding mode should code the macroblock with the least amount of bits at a given distortion.
Various test models provide a baseline implementation of the standards that usually make the mode decision based on only the distortion after motion compensation. For example, in the current MPEG coding strategies (e.g., Test Models 4 and 5 (TM4 and TM5)), the coding mode for each macroblock is selected by comparing the energy of the predictive residuals (error signal). Namely, the intra mode/inter mode decision is determined by a comparison of the variance (.sigma..sup.2) of the macroblock pixels against the variance of the predictive residuals for each coding mode. However, the coding mode selected by this criterion may not achieve optimal coding performance, since a high variance may not necessarily translate to an increase in the number of bits necessary to code a macroblock.
Furthermore, such mode decision methods ignore the number of bits needed to code each mode (e.g., bidirectional interpolation requires two (2) motion vectors and the reduction in distortion, if any, should be valued against the increase in the number of bits to code the additional motion vector). The bits can be split into motion vector coding bits, DCT coding bits, and overhead bits. At very low bit-rates (e.g., real time applications), the motion vector coding bits become quite significant.
Furthermore, other methods attempt to address mode decision by using a computationally expensive joint rate and distortion optimal/near-optimal approach that computes a joint cost function between distortion and rate (suitably added through a Lagrangian multiplier). Thus, some of the current methods are computationally very expensive as these methods involve iterative estimation of the Lagrangian multiplier, while other methods resort to oversimplification of the mode decision process, thereby achieving little or no improvement in performance.
Therefore, a need exists in the art for an apparatus and method for selecting a coding mode which approaches the optimal solution and is relatively simple to facilitate practical implementation, e.g., real time application.