The present invention relates to reconstructing motion compensated images, and more particularly, to constructing a motion compensated block with fractional pel accuracy in a transform domain.
Video data is commonly compressed utilizing well known compression standards such as JPEG, MPEG-1, MPEG-2, and H.261. In order to obtain a compressed representation of the video data, these compression standards utilize intraframe coding techniques in order to exploit spatial redundancies often found within a single frame of video. A common intraframe coding technique employs a block-based two-dimensional transform that transforms each frame of video data from a spatial domain to a transform domain. One common intraframe coding technique first divides a video frame into 8xc3x978 blocks of pels, and independently applies a two-dimensional discrete cosine transform (DCT) to each pel block. This operation results in an 8xc3x978 block of DCT coefficients in which most of the energy in the original pel block is typically concentrated in a few low-frequency coefficients. The 8xc3x978 block of DCT coefficients is then quantized and variable length encoded in order to reduce the number of bits necessary to represent the original 8xc3x978 pel block.
Moreover, compression standards, such as MPEG-1, MPEG-2, and H.261, utilize interframe coding techniques in order to exploit temporal redundancies often found between temporally adjacent video frames. These compression standards exploit temporal redundancy by computing an interframe difference signal called xe2x80x9cprediction error.xe2x80x9d In computing the prediction error, the technique of motion compensation is employed to correct the prediction for motion. Reference is made to FIG. 1 in order to illustrate unidirectional motion estimation which is also known as xe2x80x9cforward prediction.xe2x80x9d In forward prediction, a target macroblock 100 of a video frame 102 to be encoded is matched with pel blocks of the same size in a past video frame 104 called the xe2x80x9creference video frame.xe2x80x9d The pel block in the reference video frame 104 that best matches the target macroblock 100 is selected for use as a prediction macroblock 106. After selecting the prediction macroblock 106, a prediction error macroblock is computed as the difference between the target macroblock 100 and the prediction macroblock 106. The prediction error macroblock is then encoded utilizing the two-dimensional DCT encoding technique described above.
The position of the prediction macroblock 106 is indicated by a motion vector 108 that indicates a horizontal and vertical pel displacement between the target macroblock 100 and the prediction macroblock 106. The motion vector 108 is then encoded for transmission along with the encoded prediction error macroblock.
FIG. 2 depicts a block diagram of a prior art video editing system 200 that utilizes a traditional approach for editing video compressed in accord with the MPEG-2 standard. The video editing system 200 essentially decompresses the video stream to obtain the video stream in the spatial domain, edits the decompressed video stream in the spatial domain, and compresses the edited video stream in order to place the edited video stream back into the compressed domain.
Performing image manipulation techniques such as resizing, transcoding, and compositing are relatively straight forward in the spatial domain since the goal of these editing techniques are to alter the spatial domain appearance of video frames. For example, resizing video frames of a video stream in the spatial domain involves downsampling the pels of each video frame in order to reduce the spatial resolution of each video frame. In other words, in order to reduce the spatial resolution of each 720xc3x97480 video frame of a video stream to a 360xc3x97240 video frame in the spatial domain, the video editing system 200 may average each two by two block of pels to obtain a single pel. While the video editing system 200 is relatively intuitive implementation of a compressed video editing system, the video editing system 200 is also computationally intensive due to (1) the high computational complexity of the decompression and compression tasks, and (2) the large volume of spatial domain data that has to be manipulated. Due to the computational complexity of the video editing system 200, the hardware required to implement the video editing system 200 may be costly.
For this reason there has been a great effort in recent years to develop fast algorithms that perform these video editing techniques directly in the compressed domain and thereby avoid the need of completely decompressing the video stream. One such example is U.S. Pat. No. 5,708,732 to Merhav et al., entitled Fast DCT Domain Downsampling and Inverse Motion Compensation, which is hereinafter referred to as the xe2x80x9cMerhav patentxe2x80x9d. The Merhav patent discloses a method of inverse motion compensating in the DCT domain. In particular, the Merhav patent discloses inverse motion compensating in the DCT domain with integer pel accuracy. In other words, the Merhav patent discloses a method of inverse motion compensating in the DCT domain based upon a motion vector that indicates integer pel displacements in both the vertical and horizontal directions.
However, one drawback of the method described in the Merhav patent arises from the method being limited to inverse motion compensating with integer pel accuracy. Such a limitation is a drawback because both the MPEG-1 and MPEG-2 standards utilize motion vectors that are computed to the nearest half pel of displacement, using bilinear interpolation to obtain brightness values between pels. As a result, the method of the Merhav patent when applied to MPEG-1 and MPEG-2 video, would likely result in an edited MPEG video stream that contains undesirable visible artifacts due to the method disclosed in the Merhav patent not taking into account motion vectors computed to the nearest half pel of displacement.
Accordingly, since MPEG video utilizes motion vectors computed with half pel accuracy, there is still a need for transform domain inverse motion compensation having fractional pel accuracy.
The present invention fulfills the above need, as well as others, by providing a partial video decoder that inverse motion compensates interframe encoded frames in a transform domain with fractional pel accuracy. In general, the partial video decoder partially decodes a compressed video stream in order to obtain a transform domain representation of the video stream. In obtaining the transform domain representation of the video stream, the partial video decoder reconstructs frames that have been encoded based upon other frames (i.e. reference frames) of the video stream. In particular, the partial video decoder constructs transform domain target blocks of a video frame based upon transform domain prediction blocks and transform domain prediction error blocks. In order to obtain the transform domain prediction block, the partial video decoder inverse motion compensates transform domain reference blocks of a reference frame. To this end, the partial video decoder applies shifting and windowing matrices to the transform domain reference blocks which shift the transform domain reference blocks by fractional pel amounts. By utilizing the shifting and windowing matrices that account for fractional pel displacements, the partial video decoder inverse motion compensates the transform domain reference blocks without introducing undesirable artifacts that would otherwise arise if integer pel shifting and windowing matrices were utilized.
An exemplary method according to the present invention is a method of constructing a transform domain target block. One step of the method includes the step of obtaining a first displacement value. Another step of the method includes the step of determining whether the first displacement value indicates a first fractional pel displacement between a pel target block of a target image and a pel prediction block of a reference image. The method also includes the step of constructing a transform domain prediction block in the transform domain based upon a transform domain representation of the reference image. The constructing step of the method includes the step of shifting a transform domain reference block of the transform domain representation by a first fractional pel amount that is based upon the first fractional pel displacement, if the determining step determines that the first displacement value indicates the first fractional pel displacement. The method also includes the step of constructing the transform domain target block based upon the transform domain prediction block and a transform domain prediction error block that represents a pel difference between the pel target block and the pel prediction block.
The present invention further includes various apparatus for carrying out the above method. For example, one apparatus according to the present invention includes a decoding circuit, an inverse motion compensation unit, and an adding circuit. The decoding circuit is operable to receive a compressed video stream. The decoding circuit is also operable to obtain from the compressed video stream, a transform domain prediction error block that represents in a transform domain a pel difference between a pel target block of a target video frame and a pel prediction block of a reference video frame. Moreover, the decoding circuit is operable to obtain from the compressed video stream a motion vector that represents a first fractional pel displacement between the pel target block and the pel prediction block.
The inverse motion compensation unit is coupled to the decoding circuit and is operable to receive the motion vector. The inverse motion compensation unit is also operable to obtain a transform domain prediction block from a transform domain reference video frame that represents the reference video frame in the transform domain. In particular, the inverse motion compensation unit is operable to obtain the transform domain prediction block by shifting a transform domain reference block of the transform domain reference video frame, a fractional pel amount based upon the first fractional pel displacement.
The adding circuit is coupled to the decoding circuit and the inverse motion compensation unit. The adding circuit is operable to receive the transform domain prediction error block from the decoding circuit. The adding circuit is also operable to receive the transform domain prediction block from the inverse motion compensation unit. Moreover, the adding circuit is operable to combine the transform domain prediction error block and the transform domain prediction block in order to obtain a transform domain target block of a transform domain video stream.
The above features and advantages, as well as others, will become more readily apparent to those of ordinary skill in the art by reference to the following detailed description and accompanying drawings.