1. Field of the Invention
The present invention relates to image processing, and, in particular, to transcoding encoded image data between bitstreams conforming to two different image compression standards.
2. Description of the Related Art
A number of different image compression standards have been and will continue to be used to encode image data for more efficient storage and/or transmission of video content. The JPEG (Joint Photographic Experts Group) standard was originally designed for still images, but is also applied to sequences of images in which each image is encoded using only intra-frame encoding techniques (i.e., without reference to any other images in the sequence). Such encoded data is referred to as motion-JPEG or MJPEG encoded data. In MJPEG encoding, each image is transformed using a block-based discrete cosine transform (DCT). The resulting DCT coefficients are then quantized and run-length encoded to generate sets of run-length pairs. The run-length pairs are then encoded into an MJPEG bitstream using variable-length coding (VLC).
The DV (Digital Video) standard is a coding standard for digital video camcorders and digital VCRs. Like MJPEG, the DV standard relies primarily on DCT-based intra-frame encoding techniques to encode sequences of images. One major difference between the DV and MJPEG standards is that DV encoding supports two different modes for the DCT transform: a frame mode and a field mode. In the frame mode, also referred to as the 8xc3x978 mode, 8xc3x978 blocks of pixel data are encoded using an 8xc3x978 DCT transform, similar to the processing in MJPEG encoding.
In the field mode, also referred to as the 2-4xc3x978 mode, image data are encoded using a 4xc3x974 DCT transform. In 2-4xc3x978 mode, two different types of 4xc3x978 blocks of DCT coefficient data are generated: even 4xc3x978 DCT blocks and odd 4xc3x978 DCT blocks. An even 4xc3x978 DCT block Xe corresponds to the DCT of the sum of pixel data from the top and bottom fields of a video frame, as represented in the following Equation (1):                                           X            e                    =                                                    C                4                            ⁡                              [                                                      x                    ⁡                                          (                                                                        2                          ⁢                          i                                                ,                        j                                            )                                                        +                                      x                    ⁡                                          (                                                                                                    2                            ⁢                            i                                                    +                          1                                                ,                        j                                            )                                                                      ]                                      ⁢                          C              8              t                                      ,                            (        1        )            
where C4 is a length-4 DCT transform matrix, C8t is the transpose of a length-8 DCT transform matrix, x(2i,j) is a 4xc3x978 block of pixel data from the top field, and x(2i+1,j) is the corresponding 4xc3x978 block of pixel data from the bottom field. Similarly, the corresponding odd 4xc3x978 DCT block X0 corresponds to the DCT of the difference of the pixel data from the same top and bottom fields, as represented in the following Equation (2):                               X          o                =                                            C              4                        ⁡                          [                                                x                  ⁡                                      (                                                                  2                        ⁢                        i                                            ,                      j                                        )                                                  -                                  x                  ⁡                                      (                                                                                            2                          ⁢                          i                                                +                        1                                            ,                      j                                        )                                                              ]                                ⁢                                    C              8              t                        .                                              (        2        )            
The even and odd sets of DCT coefficient data generated using Equations (1) and (2) are then quantized, run length encoded, and variable-length encoded.
The MPEG (Moving Picture Experts Group) standard was designed for sequences of images, in which each image is encoded using intra-frame encoding techniques and/or inter-frame encoding techniques (in which image data are encoded based on pixel differences between the current image and a reference image that is generated from one or more other images in the sequence). As in MJPEG and DV processing, in MPEG processing, a DCT transform is applied to blocks of image data to generate blocks of DCT coefficients that are then further processed (i.e., quantized, run-length encoded, and variable-length encoded) to generate the corresponding MPEG encoded bitstream.
Much encoded video content existsxe2x80x94and will continue to be generatedxe2x80x94based on the MJPEG and DV standards. It would be advantageous to be able to make such MJPEG- and DV-based video content available to, for example, PC users having only MPEG image processors. This would enable someone with a DV-based camcorder and an MPEG-based PC to generate video content with the camcorder and then play and otherwise process that video content on the PC.
Transcoding refers to the process of converting an input encoded video bitstream that conforms to one image processing standard (e.g., MJPEG or DV) into an output encoded video bitstream that conforms to another image processing standard (e.g., MPEG). One brute-force approach to transcoding is to fully decode the input bitstream using a decoder conforming to the first image processing standard and then re-encode the resulting decoded sequence of images using an encoder conforming to the second image processing standard. In order to implement such brute-force transcoding in many real-time applications (i.e., where the transcoder is required to generate the output bitstream at the same frame rate at which it receives the input bitstream), the transcoders would need to be implemented using expensive hardware-based (e.g., MJPEG or DV) decoders and (e.g., MPEG) encoders.
The present invention is directed to techniques for transcoding input encoded video bitstreams conforming to a first DCT-based image compression standard (e.g., MJPEG or DV) into output encoded video bitstreams conforming to a second DCT-based image compression standard (e.g., MPEG). As opposed to brute-force transcoding techniques in which an input bitstream is fully decoded and then fully re-encoded to generate an output bitstream, under the present invention, the input bitstream is only partially decoded according to the first standard into the DCT domain (i.e., dequantized DCT coefficients), and then the re-encoding processing for the second standard starts with those dequantized DCT coefficients to generate the output bitstream. Because transcoders of the present invention only perform part of the full decoding and encoding processes, which do not include application of the computationally expensive inverse and forward DCT transforms, these transcoders can be implemented using PC-based software-only solutions and still meet the throughput requirements of many real-time applications. As such, the expense of requiring full decoders and encoders is avoided by the present invention.
According to one embodiment, the present invention is a method for converting an input encoded video bitstream conforming to a first DCT-based compression algorithm into an output encoded video bitstream conforming to a second DCT-based compression algorithm different from the first DCT-based compression algorithm, comprising the steps of (a) applying decoding steps conforming to the first compression algorithm to the input bitstream to generate dequantized DCT coefficient data in a DCT domain; (b) performing motion-compensated inter-frame differencing on the dequantized DCT coefficient data in the DCT domain based on motion vectors corresponding to block boundaries; and (c) applying encoding steps conforming to the second compression algorithm to the motion-compensated inter-frame DCT coefficient difference data to generate the output bitstream.