1. Field of the Invention
The present invention relates to video compression processing, and, in particular, to the coding of video dissolves using predictive encoders.
2. Description of the Related Art
Predictive video encoders, such as those conforming to an MPEG video compression standard, gain much of their compression capability by making predictions from other, previously coded frames. MPEG coders have three main types of frames: I, P, and B. An I frame is coded independently using intra-frame encoding techniques without reference to any other frames. A P frame is coded using inter-frame encoding techniques as the motion-compensated difference between itself and the previously-coded P or I frame. P and I frames are referred to as xe2x80x9canchorxe2x80x9d frames, because they can be used as references for coding other frames. Depending on which particular MPEG B-frame encoding mode is enabled and depending on which prediction technique provides the best coding results, each macroblock in a B frame may be coded (1) using forward prediction as the difference between itself and the previous anchor frame, (2) using backward prediction as the difference between itself and the next anchor frame, (3) using interpolated or bidirectional prediction as the difference between itself and the average of the previous and next anchor frames, or (4) as an intra-coded block without any prediction from an anchor frame. Many MPEG coders simply apply a repeating pattern of I, P, and B frames. For example, a typical 15-frame GOP (group of pictures) pattern may consist of the frame sequence (IBBPBBPBBPBBPBB) repeated over the coded video stream.
A xe2x80x9cdissolvexe2x80x9d is a common technique used in video production to transition between two scenes. A dissolve is a gradual transition from a preceding scene to a subsequent scene that occurs over a number of consecutive frames. Each frame in a dissolve is the weighted average on a pixel-by-pixel basis of two imagesxe2x80x94one image from the preceding scene and the other from the subsequent scene, where pixel Dij in the ith row and jth column of a particular dissolve frame is given by Equation (1) as follows:
Dij=(Aij)*(1xe2x88x92k)+(Bij)*(k)xe2x80x83xe2x80x83(1)
where Aij is the corresponding pixel in the corresponding image from the previous scene, Bij is the corresponding pixel in the corresponding image from the subsequent scene, and k is a weighting factor that starts at 0 at the first frame of the dissolve and increases to 1 at the last frame of the dissolve. The rest of the frames in a dissolve (where 0 less than k less than 1) are referred to as xe2x80x9cmixed-videoxe2x80x9d frames, because they are formed as a mixture (i.e., the weighted average) of frames from two different scenes. Note that, in a dissolve, either the previous scene or the subsequent scene or both may correspond to still images. A fade to or from black (or white or any other uniform color) is just a special case of such still-image-based dissolves.
Dissolves are notoriously difficult to encode because the various prediction tools in MPEG algorithms do not work very well to predict the xe2x80x9cmixed videoxe2x80x9d frames that make up a dissolve. For typical scene changes, no amount of motion compensation will yield a good prediction. For MPEG coders that apply a repeating pattern of I, P, and B frames over the coded video stream, depending on how long the frame pattern is relative to the length of the dissolve and the relative phasing of the frame pattern with respect to the dissolve, prediction errors over successive P frames during a dissolve can build up to a level where the corresponding decoded frames are very distorted.
The present invention is directed to a technique for improving the efficiency of coding dissolves in video streams. According to certain embodiments of the present invention, the coding of dissolves is constrained to ensure that, other than the first frame and/or the last frame, no other frame in a dissolve is coded as an anchor frame (e.g., an MPEG I or P frame). In these embodiment, the present invention constrains video coding such that all intermediate (e.g., mixed-video) frames in dissolves are coded as non-anchor frames (e.g., MPEG B frames). In other embodiments, one or more of the intermediate frames may be coded as anchor frames (e.g., MPEG I or P frames), with the rest of the mixed-video frames coded as B frames, where the one or more I or P frames are restricted to particular frame locations within the dissolve. For typical video dissolves, the present invention provides efficient coding in terms of both the bit rate of the corresponding compressed video bitstream as well as the quality of the corresponding decoded video images.
According to one embodiment, the present invention is a method for coding a video stream using a video compression algorithm that supports both intra-frame coding and inter-frame coding, comprising the steps of (a) selecting first and last frames corresponding to an n-frame dissolve between a previous scene and a subsequent scene in the video stream; and (b) constraining the coding of the n frames of the dissolve such that either (1) no intermediate frame in the dissolve falling between the first and last frames is coded as an anchor frame or (2) only one or more intermediate frames at one or more specific locations within the dissolve are coded as anchor frames, where the one or more specific locations are functions of the number n of frames in the dissolve and all other intermediate frames are coded as non-anchor frames.