The present invention relates generally to video data processing, and more particularly, to transform domain resizing of an image represented in a transform domain with transform domain blocks representing reordered pels.
Video data is commonly compressed utilizing a compression standard such as MPEG-1, MPEG-2, and H.261. In order to obtain a compressed representation of the video data, these compression standards utilize intraframe and interframe coding techniques in order to exploit spatial and temporal redundancies often found within video data.
Intraframe coding techniques exploit redundancies within a single frame of video data. A common intraframe coding technique employs a block-based two-dimensional transform that transforms each frame of video data from a spatial domain to a transform domain. One common intraframe coding technique first divides a video frame into 8xc3x978 blocks of pels, and independently applies a two-dimensional discrete cosine transform (DCT) to each pel block. This operation results in an 8xc3x978 block of DCT coefficients in which most of the energy in the original pel block is typically concentrated in a few low-frequency coefficients. The 8xc3x978 block of DCT coefficients is then quantized and variable length encoded in order to reduce the number of bits necessary to represent the original 8xc3x978 pel block.
In contrast to intraframe coding techniques, interframe coding techniques exploit temporal redundancies often found between temporally adjacent video frames. These compression standards exploit temporal redundancy by computing an interframe difference signal called xe2x80x9cprediction error.xe2x80x9d In computing the prediction error, the technique of motion compensation is employed to correct the prediction for motion. One type of unidirectional motion estimation utilized by the MPEG-2 standard is known as xe2x80x9cforward prediction.xe2x80x9d In forward prediction, a target macroblock of a video frame to be encoded is matched with pel blocks of the same size in a past video frame called the xe2x80x9creference video frame.xe2x80x9d The pel block in the reference video frame that best matches the target macroblock is used as a prediction macroblock. A prediction error macroblock is then computed as the difference between the target macroblock and the prediction macroblock. The prediction error macroblock is then encoded utilizing the two-dimensional DCT encoding technique described above. Moreover, the position of the prediction macroblock within the reference frame is indicated by a motion vector that indicates a horizontal and vertical pel displacement between the target macroblock and the prediction macroblock. The motion vector is then encoded for transmission along with the encoded prediction error macroblock.
Some video compression standards, such as the MPEG-2 standard, also provide specialized encoding schemes that more efficiently compress video streams containing interlaced video frames. For example, the MPEG-2 standard provides for field DCT encoding and frame DCT encoding of interleaved macroblocks. The difference between field DCT encoding and frame DCT encoding, is that field DCT encoding reorders pels of the macroblock prior to DCT encoding. The pels are reordered in an attempt to increase vertical correlation within the macroblock and thus increase the energy compaction of DCT encoding video streams.
One feature of compressed video is that the image resolution may be changed to accommodate available bandwidth. For example, in order to lower the bandwidth required to transmit a video stream, the resolution of the video stream may be reduced. The method by which resolution may be reduced is through image or video frame resizing. Frames of a video stream may be resized in order to achieve a second representation of the video stream having a desired resolution.
Essentially, resizing of a video stream involves resizing each video frame of the video stream. For example, an MPEG-2 video stream may include frames having a resolution of 720xc3x97480 pels. Each frame of the MPEG-2 video stream may be downsized by a factor of two in order to obtain a second representation of the first video stream that includes frames of 360xc3x97240 resolution. Similarly, an MPEG-2 video stream may include frames having a resolution of 360xc3x97240. Each frame of the an MPEG-2 video stream may be upsized by factor of two in order to obtain a second representation of the an MPEG-2 video stream that includes frames of 720xc3x97480 resolution.
Changing the resolution of an image has practical uses in many environments. For example, image resolution may be altered in order to (i) convert from one video format to another, (ii) alter the displayed size of the image on a computer display, and (iii) display a smaller representation of the image on a television screen to obtain a picture-in-picture effect.
FIG. 1 depicts a block diagram of a prior art video editing system 100 that utilizes a traditional approach for resizing video compressed in accordance with the MPEG-2 standard. The video editing system 100 essentially decompresses the video stream to obtain the video stream in the spatial domain, upsamples or downsamples the decompressed video stream in the spatial domain in order to obtain an edited video stream with the desired frame resolution, and compresses the edited video stream in order to place the edited video stream back into the compressed domain.
While the video editing system 100 is a relatively intuitive implementation of a compressed video editing system, the video editing system 100 is also computationally intensive due to (1) the high computational complexity of the decompression and compression tasks, and (2) the large volume of spatial domain data that must be manipulated. Due to the computational complexity of the video editing system 100, the hardware required to implement the video editing system 100 may be costly.
For this reason there has been a great effort in recent years to develop fast algorithms that perform these tasks directly in the compressed domain and thereby avoid the need to completely decompress the video stream. One such example is U.S. Pat. No. 5,708,732 to Merhav et al., entitled Fast DCT Domain Downsampling and Inverse Motion Compensation, the disclosure of which is hereby incorporated by reference. The Merhav patent discloses a method of altering the spatial resolution of a compressed video stream in the DCT domain. In particular, the Merhav patent discloses downsizing a DCT domain representation of a video image by factors of 2, 3 and 4.
However, one drawback of the method described in the Merhav patent arises from the method being limited to frame DCT encoded macroblocks. In other words, the method described in the Merhav patent does not address resizing video frames which include field DCT encoded blocks. Many compressed video streams currently include both field and frame DCT encoded blocks. Since the method disclosed in the Merhav patent does not account for pel reordering inherent to field DCT encoding, the method of the Merhav patent cannot be used to resize video streams that include field DCT encoded blocks. If the method of the Merhav patent were applied to an MPEG-2 video stream that includes field DCT encoded blocks, then the method would produce a resized video stream having visibly garbled areas due to its failure to account for pel reordering inherent to field DCT encoding. As should be appreciated, a video stream having garbled areas would be completely unacceptable to viewers of resized video streams.
Accordingly, there is still a need for method and apparatus that perform transform domain resizing of transform domain blocks representing spatially reordered pels of an image or video frame.
The present invention fulfills the above need, as well as others, by providing a resizing unit that resizes in the DCT domain video frames represented by field DCT encoded blocks as well as frame DCT encoded blocks. In general, the resizing unit resizes video frames which are represented in the DCT domain with both frame DCT encoded blocks and field DCT encoded blocks. To this end, the resizing unit selects appropriate precalculated resizing matrices to apply to the encoded blocks of the video frame. In particular, the resizing unit selects and applies field resizing matrices and frame resizing matrices to the encoded blocks of the video frame in order to resize the video frame. The field resizing matrices account for pel reordering resulting from field DCT encoding pel blocks of the video frame. The frame resizing matrices do not account for luminance pel reordering since frame DCT encoding does not reorder luminance pels before DCT encoding. By utilizing different resizing matrices for the field encoded blocks and the frame encoded blocks, the resizing unit of the present invention resizes video streams in the DCT domain without introducing undesirable artifacts that would otherwise arise if only a single type of resizing matrices were utilized for both field DCT encoded blocks and frame DCT encoded blocks.
An exemplary method according to the present invention is a method of resizing a spatial domain image represented in a transform domain by transform domain blocks. One step of the method includes obtaining a first transform domain block from the transform domain blocks. Another step of the method includes determining whether the first transform domain block represents in the transform domain (i) spatially reordered pels of the spatial domain image, or (ii) spatially intact pels of the spatial domain image. The method also includes the step of performing in the transform domain, field block resizing operations upon the first transform domain block if the determining step determines that the first transform domain block represents spatially reordered pels of the spatial domain image. The method also includes the step of performing in the transform domain, frame block resizing operations upon the first transform domain block if the determining step determines that the first transform domain block represents spatially intact pels of the spatial domain image.
The present invention further includes various apparatus for carrying out the above method. For example, one apparatus according to the present invention includes a buffer, a matrix store, and a processor coupled to the buffer and the matrix store. The buffer is operable to store transform domain blocks that represent a spatial domain image in a transform domain. The matrix store is operable to store resizing matrices. The processor is operable to obtain from the buffer a first transform domain block that represents in the transform domain spatially reordered pels of the spatial domain image. The processor is also operable to apply a first field resizing matrix of the resizing matrices to the first transform domain block in order obtain a resized transform domain block that has a different resolution than the first transform domain block.
The above features and advantages, as well as others, will become more readily apparent to those of ordinary skill in the art by reference to the following detailed description and accompanying drawings.