The demand for digital video products continues to increase. Some examples of applications for digital video include video communication, security and surveillance, industrial automation, and entertainment (e.g., DV, HDTV, satellite TV, set-top boxes, Internet video streaming, digital cameras, cellular telephones, video jukeboxes, high-end displays and personal video recorders). Further, video applications are becoming increasingly mobile as a result of higher computation power in handsets, advances in battery technology, and high-speed wireless connectivity.
Video compression is an essential enabler for digital video products. Compression-decompression (CODEC) algorithms enable storage and transmission of digital video. In general, the encoding process of video compression generates coded representations of frames or subsets of frames. The encoded video bitstream, i.e., encoded video sequence, may include three types of frames: intracoded frames (I-frames), predictive coded frames (P-frames), and bi-directionally coded frames (B-frames). I-frames are coded without reference to other frames. P-frames are coded using motion compensated prediction from I-frames or P-frames. B-frames are coded using motion compensated prediction from both past and future reference frames. For encoding, all frames are divided into macroblocks, e.g., 16×16 pixels in the luminance space and 8×8 pixels in the chrominance space for the simplest sub-sampling format.
Video coding standards (e.g., MPEG, H.264, etc.) are based on the hybrid video coding technique of block motion compensation and transform coding. Block motion compensation is used to remove temporal redundancy between blocks of a frame and transform coding is used to remove spatial redundancy in the video sequence. Traditional block motion compensation schemes basically assume that objects in a scene undergo a displacement in the x- and y-directions from one frame to the next. Motion vectors are signaled from the encoder to the decoder to describe this motion. The decoder then uses the motion vectors to predict current frame data from previous reference frames.
Video encoders may use a fixed coding structure such as IPPP, IBBP, Hierarchical-P or Hierarchical-B. In the IPPP coding structure, the frames of a video sequence are encoded as an I-frame followed by some number of sequential P-frames. In the IBBP coding structure, frames of a video sequence are encoded as an I-frame followed by some number of sequential frames encoded in a pattern of two B-frames followed by a single P-frame (e.g., IBBPBBPBBP . . . ). For a given search range, if maximizing quality is the main goal, it is difficult to know which coding structure is the best as the best coding structure is sequence dependent and can vary with time. Thus, there are times during encoding of a single video sequence when one coding structure provides better quality and times when another provides better quality.
In some video encoders (e.g., MPEG-4 Simple Profile video encoders), motion vectors are coded as a combination of fixed length code and variable length code. The fixed length code (called fcode) determines the search range to be used during motion estimation. Further, fcode is an important parameter that needs to be selected correctly, especially at HD resolution. If the value of the fcode is too low, the motion vector search range is limited which can lead to significant quality degradations (1 dB or more) while coding high motion sequences. If the value of the fcode is too high, large fixed overhead may be incurred for coding motion vectors which leads to degradation in rate-distortion performance (around 0.5 dB or more). To handle these issues, video encoders typically use fcode selection techniques that depend on the amount of motion in a video sequence. For example, current MPEG-4 encoders such as xVid calculate fcode based on the variance of motion vectors. Calculation of variance involves multiplications which are expensive to implement in hardware.