The present invention relates to processing of video frames, and more specifically to encoding of blocks disposed in such frames.
Digital transmission of video signals often provide higher quality pictures than their analog counterparts. Digital video is being increasingly broadcast directly to home-installed satellite television receivers. Moreover, with the development of digital video storage media such as Digital Video Disks (DVDs), consumers now have the capability to receive and store compressed digital video in their homes.
Many different video compression techniques have been developed to enable effective transmission and storage of digital video signals. Such techniques often use compression algorithms that take advantage of the correlation among adjacent pixels so as to more efficiently transmit and store video signal. In some systems, differential encoding is used to transmit only the difference between a frame and a prediction of that frame. The predicted frame is often derived from a previous frame of the same video sequence.
Successive frames in a typical video sequence are often very similar to each other. For example, a sequence of frames may have scenes in which an object moves across a stationary background, or a background moves behind a stationary object. Consequently, many scenes in one frame may also appear in a different position of a subsequent frame. Video systems take advantage of such similarities to encode blocks in the frames.
In accordance with the well-known international standards such as H.261, H.263, MPEG-1, MPEG-2, and MPEG-4, motion estimation and compensation are used to encode scene changes between frames. In accordance with the motion estimation technique, data related to the differences between positions of similar objects as they appear in various macroblocks in successive frames are captured by one or more motion vectors to estimate the motion of objects between frames. The motion vectors are then used to identify the spatial coordinates of the shifted objects in a subsequent frame. The motion vectors therefore limit the bit rate that would otherwise be required to encode the data associated with the shifted objects.
In accordance with the well-known motion compensation technique, the motion vectors are subsequently used to predict an unencoded frame. The difference between the predicted frame and the reference frame and which is commonly referred to as the error signal, is then compressed.
Partly due to its computational intensity, a motion vector is shared typically by all color components in (Y,U,V) or (Y, Cr, Cb) coordinate systems. In the (Y,U,V) color coordinate system, Y is the luma component, and U and V are the chroma components of a color. Similarly, in the (Y, Cr, Cb) color coordinate system, Y is the luma component, and Cb and Cr are the chroma components. Each motion vector is typically generated for a macroblock. Each macroblock typically includes, e.g., 16×16 or 8×8 pixels. The MPEG-2 standard provides an interlaced mode that separates each 16×16 macroblock into two 16×8 sub-macroblocks each having an associated motion vector.
In MPEG-2, interframe coding is performed on macroblocks. An MPEG-2 encoder performs motion estimation and compensation to compute motion vectors and error signals. For each macroblock M of a frame N, a search is performed across the macroblocks of the next frame, N+1, or the immediately preceding frame, N−1, to identify the most similar macroblocks in frames N+1 or N−1. The location of the most similar block relative to the block M is used to compute a motion vector which, in turn, is used to compute a predicted block for macroblock M. The difference between predicted macroblock and macro block M is used to compute the error signal. The error signal is subsequently compressed using a texture coding method such as discrete cosine transform (DCT) encoding.
To avoid error propagation, achieve random access and support various play modes, such as seeking, fast forward and fast backward, intra-frames (I frames) often require more bits to encode than predicted frames (P frames) or bi-directional frames (B frames). It is desired to have a compression technique that uses fewer number of bits to encode the I frames.