1. Field of the Invention
This invention pertains generally to motion estimation within video sequences, and more particularly to a method of estimating backward motion vectors from forward motion vectors.
2. Description of the Background Art
The efficiency with which video may be encoded and decoded is critical to the cost effective implementation of a number of video systems, in particular those applications that require real-time encoding or decoding of video. A video frame diagram is exemplified in FIG. 1 being divided into a multiplicity of macroblocks, although only nine are shown by way of example for the sake of simplicity. Each macroblock comprises a number of pixels, such as a 16×16 macroblock of pixels as utilized within MPEG-2 for frame motion vectors. The use of MPEG-2 encoding for video signals has been increasing steadily.
It will be appreciated, that the estimation of motion within a video sequence, such as encoded within an MPEG-2 format, is a computationally intensive process. By way of example, with CCIR 601 video, about 90% of the required computation within an encoder for ISO MPEG-2 is for estimating motion. The time required to encode the video can be a large multiple of the time required to decode the given video sequence. In particular, an encoder implemented as an ISO MPEG-2 technical report encoder based on Test Model 5 (TM5) requires approximately 100 times the CPU execution time on an Ultra-SPARC 80 to encode a video sequence with a horizontal search range of +/−63 pixels and vertical search range of +/−32 pixels than is required in the corresponding decoding process. As a result, although real-time decoding of MPEG-2 video has become feasible, real-time encoding still poses a challenge.
To appreciate the computational intensity, it should be recognized that MPEG-2 is a hybrid type of lossy coding scheme utilizing intra-coding and inter-coding in which redundant information contained in both the spatial domain and the temporal domain are removed to facilitate compression. Intra-coding is the compression performed on the image data in the spatial domain to generate an I-picture, also referred to as an intra-picture. Inter-coding is compression performed in the time domain to create predicted pictures, P-pictures, and bi-directional, B-pictures. Motion vectors for B-pictures may only be generated after computing forward and backward motion vectors for the preceding P-pictures and I-pictures. The I-pictures are independently encoded in relation to other near pictures, such as framing signals or field signals, while the P-pictures encode predicted and interpolated movement in response to the movement of elements from previous I-pictures and P-pictures. The B-pictures are encoded as difference signals for predicted or interpolated movements whose encoding considers the motion within previous and upcoming pictures which may be encoded only after considering the correlation of the movement of the previous I-pictures and P-pictures. The encoding mode which provides the lowest prediction error rate within the modes of forward, reverse, along with forward and reverse, is selected for use in the B-pictures. Typically, the picture structure utilized according to MPEG-2 follows a pattern I, B, B, P, B, B, P, an example of which is shown in FIG. 2 and referred to herein a macroblock vector diagram.
The frames, pictures, within a video sequence, such as within the MPEG-2 standard, are grouped together to form a group-of-pictures (GOP). A popular GOP structure described in MPEG-2 Test model 5 (TM4) is specified by the integer M and N. In TM5, the value N expresses the number of pictures in a GOP and M−1 is the number of B frames between two successive I or P frames. One typical value that may be utilized for M is three, which corresponds to having two B pictures between successive I or P pictures as was shown in FIG. 2 illustrating macroblock 1 and macroblock 3.
The macroblocks may be displaced in moving forward or backward in the frame sequence. FIG. 3 depicts a current frame 10 and a subsequent frame 12 wherein macroblock 14 in current frame 10 which has moved from a location 18 in a subsequent frame by the backward motion vector 16. The corresponding motion vectors are represented in FIG. 4 as B1, B2, B3, B4. Depicted in FIG. 5 is a frame diagram of current frame 10 and a previous frame 20, wherein macroblock 22 in current frame 10 moves from a location 26 in previous frame 20 by a forward motion vector 24. FIG. 6 is a macroblock vector diagram of the conventional single-frame distance forward motion vectors associated with the frame shown in FIG. 5.
In the case of TM5 with M=3, an encoder is required to compute the motion vectors B1, B2. If the encoder uses a search window of H×V pixels to find the motion vectors B2, and if it is assumed that the search window size scales with the frame distance between two pictures, then the computational complexity for the motion vectors B1 and B2 using full search is about H×V+4(H×V)=5(H×V), while the memory complexity is approximately 4(H×V).
A number of motion estimation methods have been devised to speed the computations within the encoding process. These methods generally must be utilized in combination to achieve the performance gains necessary to provide for real-time encoding with sufficient picture quality. For example, if method “A” can speed up the computations by a factor of five, and method “B” can speed up computations by a factor of two, and the computation methods are orthogonal, then the combined methods can result in a factor of ten speed increase. It will be appreciated that utilizing a combination of non-orthogonal methods will reduce the attainable increase in speed.
Therefore, a need exists for a method of increasing motion estimation speed that may be beneficially applied in combination with other motion estimation enhancements. The present invention satisfies those needs, as well as others, and overcomes the deficiencies of previously developed motion estimation methods.