1. Field of the Invention
The present invention relates to the field of image processing and, in particular, to a method and apparatus for simplifying frame-based motion estimation.
2. Background Information
Over the years, the Motion Picture Experts Group (MPEG) has developed a number of standards for digitally encoding (also commonly referred to as compressing) audio and video data (e.g., the well-known MPEG-1, MPEG-2 and MPEG-4 standards). Recently, particular attention has been drawn to the MPEG-2 standard [ISO/IEC 13818-2:1996(E), “Information technology—Generic coding of moving pictures and associated audio information: Video”, 1996], which generally describes a bit-stream syntax and decoding process for broadcast quality digitized video. The MPEG-2 standard is widely used in emerging state-of-the-art video delivery systems including digital versatile disk (DVD, sometimes referred to as digital video disk), direct broadcast satellite (DBS) (e.g., digital satellite television broadcasts) and high-definition television (HDTV).
The rising popularity of the MPEG-2 standard may well be attributed to its complex video compression technology that facilitates the broadcast quality video. Compression is basically a process by which the information content of an image or group of images (also referred to as a Group of Pictures, or GOP) is reduced by exploiting the spatial and temporal redundancy present in and among the image frames comprising the video signal. This exploitation is accomplished by analyzing the statistical predictability of the signal to identify and reduce the spatial and temporal redundancies, thereby reducing the amount of storage and bandwidth required for the compressed data. The MPEG-2 standard provides for efficient compression of both interlaced and progressive video content at bit rates ranging from 4 Mbps (for DVD applications) to 19 Mbps (for HDTV applications). FIG. 1 illustrates a block diagram of the complex elements of an example prior art MPEG-2 encoder for compressing video data.
As shown in the block diagram of FIG. 1, encoder 100 is generally comprised of an intra-frame encoder 102, an inter-frame encoder 104 a multiplexer 106 and a buffer 108, which controls the rate of broadcast of the compressed video data. Each of the intra-frame encoder 102 and inter-frame encoder 104 will be described in turn, below.
Simplistically speaking, compression by intra-frame compressor 102 may be thought of as a three-step process wherein spatial redundancy within a received video frame is identified, the frame is quantized and subsequently entropy encoded to reduce or eliminate the spatial redundancy in the encoded representation of the received frame. The identification of spatial redundancy within a frame is performed by transforming spatial amplitude data of the frame into a spatial frequency representation of the frame using the discrete cosine transform (DCT) function 110. The DCT function is performed on 8×8 pixel “blocks” of luminance (brightness) samples and the corresponding blocks of chrominance (color differential) samples of the two-dimensional image, generating a table of 64 DCT coefficients. The block of DCT coefficients is then compressed through Quantizer (Q) 112. Quantization is merely the process of reducing the number of bits required to represent each of the DCT coefficients. The quantizing “scale” used can be varied on macroblock (16×16 pixel) basis. The quantized DCT coefficients are then translated into a one-dimensional array for encoding 114 via variable length encoding and run length encoding. The order in which the quantized DCT coefficients are scanned into encoder 114 affects the efficiency of the encoding process. In general, two patterns for scanning the block of quantized DCT coefficients are recognized, the zigzag pattern and the alternate scan pattern, each of which are depicted in FIG. 2 as pattern 200 and 250, respectively. Those skilled in the art will appreciate that with prior art intra-frame compression such as that employed by intra-frame encoder 102, the zigzag scan pattern 200 is typically used as it produces long runs of zeroes, as the block of DCT coefficients are transformed run-length/value pairs for the variable length encoding process. The quantized, entropy encoded DCT coefficients along with the quantization tables are then sent to MUX 106 for broadcast and/or storage through rate control buffer 108.
Inter-frame compressor 104 reduces the temporal redundancies existing between frames in a group of pictures and is typically a complex process of motion estimation between frames and fields of the frames using reconstructed past and predicted future frames as a reference. Accordingly, inter-frame compressor 104 is depicted comprising motion estimator 116 which statistically computes motion vectors to anticipate scene changes between frames, anchor frame storage 118 to store reconstructed prior frame data (from the quantized DCT coefficients) and predicted frame storage 120 to store a predicted future frame based on motion vectors predicted by motion estimator 116 and current frame information. In addition, inter-frame compressor 104 is depicted comprising inverse quantizer 122, inverse DCT 124 and a summing node 126 to reconstruct the present or past frames for storage in anchor frame storage 118.
Those skilled in the art will appreciate that the MPEG-2 standard provides for three types of video frames and that the type of frame determines how the motion estimation for that frame is to be accomplished. The three frame types are Intra-frame coded (I-frame), Predictably encoded frames (P-frame) and bidirectionally interpolated frames (B-frame). I-frames are encoded based only on the content within the frame itself and are typically used as reference and synchronization frames. That is, the separation between I-frames is used to denote Groups of Pictures (GOPs). P-frames are encoded based on the immediate past I- or P-frames (also referred to as anchors), and B-frames are encoded based on past or future I- and P-frames (thus the need for anchor and predicted frame storage 118 and 120, respectively). Predicting content based on frame data is graphically illustrated with reference to FIG. 3.
Turning to FIG. 3, a graphical representation of a typical GOP sequence of frames is presented 300 denoting an IBBPBBI sequence (commonly referred to as a GOP (6, 3) sequence by those skilled in the art). As shown in FIG. 3, encoding of I-frame 302 does not rely on any prior or future frame. Encoding of B-frame 304 utilizes information from past frames (e.g., I-frame 302) as well as future I and/or P-frames (e.g., P-frame 306).
If the frame sequence contains interlaced content, field prediction is also performed in calculating the motion vector. Simplistically speaking, frames are broken into even and odd fields, and the content of each field is predicted based on the information contained in both the odd and the even fields of the past and/or future frames (depending on the frame type, P or B-frames, respectively). More specifically, the content of P- and B-frames are predicted by analyzing the even and odd fields of past and/or future anchor frames. A typical field prediction process is depicted in FIG. 4.
With reference to FIG. 4, two frames 402 and 410 are depicted broken into their constituent even (404 and 412) and odd (406 and 414) fields, respectively. In this example, frame 402 is an I-frame, while frame 410 is a B-frame. In accordance with the prior art, the even field 412 of B-frame 410 is predicted from the even 404 and odd 406 field of the prior I-frame 402.
Those skilled in the art will appreciate that, although the computationally intensive video encoding associated with the MPEG-2 standard provides high resolution video imagery, its implementation typically requires one or more powerful, dedicated processor(s) (e.g., a microcontroller, an application specific integrated circuit (ASIC), a digital signal processor (DSP) and the like) to encode (or, conversely decode) MPEG-2 standard video data (e.g., to/from a DVD disk). Attempts to utilize the general purpose central processing unit (CPU) of a typical home computer for MPEG-2 processing has proven computationally prohibitive, as the MPEG-2 standard processing consumed nearly all of the computational resources of the general purpose CPU, thereby rendering the computer virtually useless for any other purpose. As a consequence, providing MPEG-2 standard video technology in a personal computer has heretofore required the addition of the costly dedicated video processors described above.
As a result of the cost and performance limitations commonly associated with real-time video encoding described above, the roll-out of MPEG-2 video multimedia capability in the home computing market has been slowed. Consequently, a need exists for a method and apparatus for encoding enhancements to facilitate real-time video encoding that is unencumbered by the deficiencies and limitations commonly associated with the prior art. An innovative solution to the problems commonly associated with the prior art is provided herein.