1. Field of the Invention
The present invention is generally related to digital video signal processing using video compression schemes such as MPEG (Moving Picture Experts Group) and H.26x (ITU-T Recommendation H.261 or H.263). The invention is more particularly related to methods and systems for video transcoding in a compressed domain, such as a discrete cosine transform (DCT) domain.
2. Background
Video transcoding is a process of converting a previously compressed video bit-stream into another compressed video bit-stream with a lower bit-rate, a different display format (e.g., downscaling), or a different coding method (e.g., conversion between H.26x and MPEG, conversion among MPEG-1, 2, and 4, or adding error resilience), etc. An application of the bit-rate adaptation (usually rate reduction) can provide fine and dynamic adjustments of the bit-rate of the video bit-stream according to actual network conditions, e.g., available bandwidth.
The bit-rate adaptation may be performed for video bridging over heterogeneous networks, for example, multipoint video conferencing, remote collaboration, remote surveillance, video on demand, video multicast over heterogeneous networks, and streaming video. The video transcoder placed at a boundary between the heterogeneous networks enables each receiver of the video bit-stream to decode video as received, without additional functional requirements in the decoder.
FIG. 1 shows an exemplary process of video coding. Video compression is based on motion compensated predictive coding with an I-P or I-B-P frame structure. Here, I, P, and B frames represent intra, predictive, and interpolated frames, respectively. P and B frames are also called “inter-frames,” whereas I frames are “intra-frames.” An MPEG video sequence 100 has a group-of-picture (GOP) structure in which a P-frame coding is dependent on its precedent I/P-frame, and a B-frame coding is dependent on its preceding I/P-frame and succeeding I/P frame.
Each frame (picture) 110 is divided into blocks 120, each of which comprises 8×8 pixels. A 2×2 matrix of blocks is called a macroblock 115. For intra-frames, DCT converts each block 120 of pixels into a block of 8×8 DCT coefficients 130. The most upper-left coefficient is a DC component, i.e., a zero spatial frequency component. In the 8×8 block, as a distance from the most upper-left point becomes larger, the spatial frequency the coefficient represents becomes higher. Human eyes are more sensitive to lower-frequency coefficients, and thus n×n low-frequency DCT coefficients 135 may be sufficient for a required quality of decoded pictures.
For inter-frames, motion compensation is performed. A frame to code is called a “target” frame 140, and its preceding (and succeeding in the case of coding B-frames) I/P frame is called a “reference” frame 150. For each block 160 in target frame 140, reference frame 150 is searched to find a block 170 whose image best matches an image of block 160. A decoded image (not an original image) is used as the image of reference frame 150. A motion vector 155 represents an amount and direction of the movement of block 160 relative to block 170.
Then, a difference (prediction error) between the image of 8×8 pixel block 160 and the image of 8×8 pixel block 170 is calculated, and DCT converts the difference into DCT coefficients 180 for target block 160. Motion vectors 155 can be specified to a fraction of a pixel, i.e., half-pixels. The best-matching reference block 170 may not be aligned with the original blocks of reference frame 150, and may intersect with two or four neighboring blocks 190 of the original blocks. The location of block 170 within neighboring blocks 190 is represented by a height (h) and a width (w). An overlapping area of block 170 with the upper-right block of neighboring blocks 190 is h×w. An overlapping area of block 170 with the lower-right block of neighboring blocks 190 is (8−h)×w. An overlapping area of block 170 with the upper-left block of neighboring blocks 190 is h×(8−w). An overlapping area of block 170 with the lower-left block of neighboring blocks 190 is (8−h)×(8−w).
FIG. 2 shows a direct implementation of the video transcoder as a cascaded pixel-domain transcoder 200. Transcoder 200 decodes an incoming compressed bit-stream into a pixel-domain (e.g., blocks 120 and 160 in FIG. 1), and then re-encodes the decoded video into the desirable bit-rate or format. More particularly, the incoming bit-stream is processed by an inverse quantizer (IQ1) 210 and by an inverse DCT process (IDCT1) 220. The result is stored in a frame memory 230 to enable performing motion compensation at a motion compensator (MC) 235 on the result using motion vectors for decoding of inter-frames. This decoded and motion compensated video is then processed by a DCT process 245 and by a quantizer (Q2) 250 to output the re-encoded bit-stream. The output is re-decoded by an inverse quantizer (IQ2) 260 (inverse of Q2) and by an IDCT2 process 270 (inverse of DCT 245) and then stored in a frame memory 280 to perform motion compensation at a motion compensator (MC) 285 for encoding of inter-frames.
Cascaded pixel-domain transcoder 200 is flexible, since a decoder-loop 240 and an encoder-loop 290 can be totally independent from each other. Therefore, decoder 240 and encoder 290 in transcoder 200 can operate at, for example, different bit-rates, frame-rates, picture resolutions, coding modes, and even different standards. Also, transcoder 200 can be implemented to achieve a drift-free operation if the implementations of inverse discrete cosine transform (IDCT) in the front-encoder, which has encoded the incoming bit-stream, and the end-decoder, which will receive the outgoing bit-stream, are known. In this case, the decoder-loop and the encoder-loop can be implemented to produce exactly the same reconstructed pictures as those in the front-encoder and the end-decoder, respectively. If the implementations of the IDCT are not known but satisfy the IDCT standards specifications defined, for example, in IEEE 1180-1990, and the macroblocks are refreshed as specified in the standards such as ISO/IEEE 13818-2 and ITU-T Recommendation H.263, the drift will not be a major issue. Less drift errors result in a higher quality of pictures.
On the other hand, cascaded pixel-domain transcoder 200 is computationally expensive. The overall complexity is not as high as the sum of a decoder and an encoder in a case of reusing several coding parameters such as coding modes (Intra/Inter) and motion vectors (MVs) in transcoder 200. Even with this arrangement, however, a disadvantage of high-complexity still remains.
In implementing transcoders, the computational complexity and picture quality are usually the issues to be traded off to meet various requirements in practical applications. For example, the computational complexity is critical in real-time applications to speed up the transcoding operations.
Several fast video transcoder architectures have thus been proposed. FIG. 3 shows a simplified pixel-domain transcoder (SPDT) 300, which reduces the computational complexity of the cascaded transcoder by reusing motion vectors and merging the decoding and encoding process. In transcoder 300, IDCT 220, MC 235, and frame memory 230 of the cascaded transcoder are eliminated. That is, transcoder 300 performs an inverse quantization on an incoming bit-stream at an inverse quantizer (IQ1) 310, and the result is re-quantized by a quantizer (Q2) 320 to output a transcoded bit-stream. For transcoding of inter-frames, the output is subjected to inverse quantization by an inverse quantizer (IQ2) 330 (inverse of Q2), and then an IDCT 340 and a DCT 370 are used to perform a motion compensation in a pixel domain. In transcoder 300, IDCT 340 operates on a difference of the results of IQ2 330 and IQ1 310. The result is stored in a frame memory 350 for motion compensation by a motion compensator (MC) 360.
SDPT 300 has the advantage of low-complexity, but considerable drift errors may occur due to the merge of decoding and encoding processes, non-linear half-pixel interpolations, and finite word-length DCT and IDCT computations.
Further simplifications have been proposed by performing motion compensation LAW OFFICES in a DCT domain (e.g., blocks 130 and 180 in FIG. 1) so that no DCT/IDCT operation is required. FIG. 4A shows such a DCT-domain transcoder (DDT) 400, in which a DCT-domain motion compensator (DCT-MC) 450 and a frame memory 440 are substituted for a series of units 380 (from IDCT 340 to DCT 370) of SPDT 300.
As shown in FIG. 4B, the DCT-MC operation can be represented as computing the coefficients of each target DCT block B 460 from the coefficients of its two or four neighboring DCT blocks 471˜474. The neighboring DCT blocks can be referred to as Bi, i=1 to 4, where B=DCT(b) and Bi=DCT(bi) are the blocks of 8×8 DCT coefficients of the associated pixel-domain blocks b and bi of the image data, respectively. Mathematically, a function of DDT 400 shown in FIG. 4A is equivalent to those of the cascaded architecture shown in FIG. 2 (with motion vector reuse) and SPDT 300 shown in FIG. 3. On the other hand, DDT 400 outperforms SDPT 300 at least to the extent of experiencing much less drift errors.
The DCT coefficients in the DCT-MC operation can be computed as follows:
                    B        =                              ∑                          i              =              1                        4                    ⁢                                          ⁢                                    H                              h                i                                      ⁢                          B              i                        ⁢                          H                              w                i                                                                        (        1        )            where each of wi and hi is one of {1,2, . . . 7}. Hh1and Hw1 are constant geometric transform matrices defined by the height (h) and width (w) of each sub-block generated by the intersection of bi with b. Direct computation of Eq. (1) requires 8 matrix multiplications and 3 matrix additions. If using the following equalities in the geometric transform matrices: Hh1=Hh2, Hh3=Hh4, Hw1÷Hw3, and Hw2=Hw4, the number of operations in Eq. (1) can be reduced to 6 matrix multiplications and 3 matrix additions. Moreover, since Hh1 and Hw1 are deterministic, at least a part of the operations in Eq. (1) can be pre-computed and then pre-stored in a memory. Therefore, no additional DCT computation is required for the computation of Eq. (1).
DDT 400 has relatively low-complexity compared to the cascaded architecture shown in FIG. 2 (with motion vector reuse), and realizes relatively low drift compared to SPDT 300 shown in FIG. 3.