The moving picture experts group (MPEG) developed a generic video compression standard that defines three types of frames, called Intra-frame (I-frame), predictive-frame (P-frame) and bi-directionally predictive frame (B-frame). A group of pictures (GOP) comprises an I-frame and a plural of P-frames and B-frames. FIG. 1 shows a GOP structure of (N,M) that comprises N frames and there are M B-frames between two I-frames or P-frames.
In recent years, due to the advances of network technologies and wide adoptions of video coding standards, digital video applications become increasingly popular in our daily life. Networked multimedia services, such as video on demand, video streaming, and distance learning, have been emerging in various network environments. These multimedia services usually use pre-encoded videos for transmission. The heterogeneity of present communication networks and user devices poses difficulties in delivering these bit-streams to the receivers. The sender may need to convert one preencoded bit-stream into a lower bit-rate or lower resolution version to fit the available channel bandwidths, the screen display resolutions, or even the processing powers of diverse clients. Many practical applications such as video conversions from DYD to VCD, i.e., MPEG-2 to MPEG-1, and from MPEG-1/2 to MPEG-4 involve such spatial-resolution, format, and bit-rate conversions. Dynamic bit-rate or resolution conversions may be achieved using the scalable coding schemes in current coding standards to support heterogeneous video communications. They, however, usually just provide a very limited support of heterogeneity of bit-rates and resolutions, e.g., MPEG-2 and H.263+, or introduce significantly higher complexity at the client decoder, e.g., MPEG-4 FGS.
Video transcoding is a process of converting a previously compressed video bit-stream into another bit-stream with a lower bit-rate, a different display format (e.g., downscaling), or a different coding method (e.g., the conversion between H.26x and MPEG-x, or adding error resilience), etc. It is considered an efficient means of achieving fine and dynamic adaptation of bit-rates, resolutions, and formats. In realizing transcoders, the computational complexity and picture quality are usually the two most important concerns.
A straightforward realization of video transcoders is the Cascaded Pixel-domain Downscaling Transcoder (CPDT) that cascades a decoder followed by an encoder as shown in FIG. 2. The computational complexity of the CPDT can be reduced by combining decoder and encoder, reusing the motion-vectors and coding-modes, and removing the motion estimation (ME) operation. This cascaded architecture is flexible and can be used for bit-rate adaptation, spatial and temporal resolution-conversion without drift. It is, however, computationally intensive for real-time applications, even though the motion-vectors and coding-modes of the incoming bit-stream can be reused for fast processing.
Recently, DCT-domain transcoding schemes have become very attractive because they can avoid the discrete cosine transform (DCT) and the inverse discrete cosine transform (IDCT) computations. Also, several efficient schemes were developed for implementing the DCT-domain motion compensation (DCT-MC). However, the conventional simplified DCT-domain transcoder cannot be used for spatial/temporal downscaling because it has to use the same motion vectors that are decoded from the incoming video at the encoding stage. The outgoing motion vectors usually are different from the incoming motion vectors in spatial/temporal downscaling applications.
The firstly proposed Cascaded DCT-domain Downscaling Transcoder (CDDT) architecture is depicted in FIG. 3, where a bilinear filtering scheme was used for downscaling the spatial resolution in the DCT domain. The decoder-loop of CDDT is operated at the full picture resolution, while the encoding is performed at the quarter resolution. The CDDT can avoid the DCT and IDCT computations required in the CPDT as well as preserve the flexibility of changing motion vectors, coding modes as in the CPDT. The major computation required in the CDDT is the DCT-MC operation, as shown in FIG. 4. It can be interpreted as computing the coefficients of the target DCT block B from the coefficients of its four neighboring DCT blocks, Bi, i=1 to 4, where B=DCT(b) and Bi=DCT(bi) are the 8×8 DCT blocks of the associated pixel blocks b and bi. A close-form solution to compute the DCT coefficients in the DCT-MC operation was firstly proposed as follows.
                    B        =                              ∑                          i              =              1                        4                    ⁢                                          ⁢                                    H                              h                i                                      ⁢                          N              i                        ⁢                          H                              w                i                                                                        (        1        )            where wi and hiε{0,1, . . . 7}. Hhi and Hwi are constant geometric transform matrices defined by the height and width of each sub-block generated by the intersection of bi with b.
It takes 8 matrix multiplications and 3 matrix additions to compute Eq. (1) directly. However, the following relationships of geometric transform matrices hold: Hh1=Hh2, Hh3=Hh4, Hw1=Hw3 and Hw2=Hw4. Usin computation of Eq. (1) can be reduced to 6 matrix multiplications and 3 matrix additions, as shown in Eq. (2) below.B=Hh1(N1Hw1+N2Hw2)+Hh3(N3Hw3+N4Hw4)  (2)where Hhi and Hwi can be pre-computed and then pre-stored in a memory. Therefore, no additional DCT computation is required for the computation of Eq. (1) and Eq. (2). FIG. 4 shows the principle of the DCT-MC operation.
To reduce the computation of the DCT-MC, the number of matrix multiplications can be reduced from 24 to 18 by the conventional shared information method, while the number of matrix additions/subtractions is a bit increased. This leads to a computational reduction of about 25% in the DCT-MC operation.
A more efficient DCT-domain downscaling scheme, named DCT decimation, was then proposed for image downscaling and later adopted in video transcoding. This DCT decimation scheme extracts the 4×4 low-frequency DCT coefficients from the four original blocks b1–b4, then combines the four 4×4 sub-blocks into an 8×8 block. Let B1, B2, B3, and B4, represent the four original 8×8 DCT blocks; {circumflex over (B)}1, {circumflex over (B)}2, {circumflex over (B)}3 and {circumflex over (B)}4 the four 4×4 low-frequency sub-blocks of B1, B2, B3, and B4, respectively; {circumflex over (b)}i=IDCT({circumflex over (B)}i), i=1, . . . , 4. Then
      b    ^    =            [                                                                  b                ^                            1                                                                          b                ^                            2                                                                                          b                ^                            3                                                                          b                ^                            4                                          ]              8      ×      8      is the downscaled version of
  b  =                    [                                                            b                1                                                                    b                2                                                                                        b                3                                                                    b                4                                                    ]                    16        ×        16              .  FIG. 5 illustrates the DCT decimation.
To compute {circumflex over (B)}=DCT({circumflex over (b)}) directly from {circumflex over (B)}1, {circumflex over (B)}2, {circumflex over (B)}3, and {circumflex over (B)}4, it can use the following expression:
                                                                        B                ^                            =                            ⁢                              T                ⁢                                  b                  ^                                ⁢                                                      T                    ⁢                                                                                                  t                                                                                                        =                            ⁢                                                                    [                                                                                                                        T                            L                                                                                                                                T                            R                                                                                                                ]                                    ⁡                                      [                                                                                                                                                      T                              4                              t                                                        ⁢                                                                                          B                                ^                                                            1                                                        ⁢                                                          T                              4                                                                                                                                                                                          T                              4                              t                                                        ⁢                                                                                          B                                ^                                                            2                                                        ⁢                                                          T                              4                                                                                                                                                                                                                                      T                              4                              t                                                        ⁢                                                                                          B                                ^                                                            3                                                        ⁢                                                          T                              4                                                                                                                                                                                          T                              4                              t                                                        ⁢                                                                                          B                                ^                                                            4                                                        ⁢                                                          T                              4                                                                                                                                            ]                                                  ⁡                                  [                                                                                                              T                          L                          t                                                                                                                                                              T                          R                          t                                                                                                      ]                                                                                                        =                            ⁢                                                                    (                                                                  T                        L                                            ⁢                                              T                        4                        t                                                              )                                    ⁢                                                                                                              B                          ^                                                1                                            ⁡                                              (                                                                              T                            L                                                    ⁢                                                      T                            4                            t                                                                          )                                                              t                                                  +                                                      (                                                                  T                        L                                            ⁢                                              T                        4                        t                                                              )                                    ⁢                                                                                                              B                          ^                                                2                                            ⁡                                              (                                                                              T                            R                                                    ⁢                                                      T                            4                            t                                                                          )                                                              t                                                  +                                                      (                                                                  T                        R                                            ⁢                                              T                        4                        t                                                              )                                    ⁢                                                                                                              B                          ^                                                3                                            ⁡                                              (                                                                              T                            L                                                    ⁢                                                      T                            4                            t                                                                          )                                                              t                                                  +                                                                                                      ⁢                                                (                                                            T                      R                                        ⁢                                          T                      4                      t                                                        )                                ⁢                                                                                                    B                        ^                                            4                                        ⁡                                          (                                                                        T                          R                                                ⁢                                                  T                          4                          t                                                                    )                                                        t                                                                                        (        3        )            
In addition, an architecture similar to the CDDT was proposed, where a reduced-size frame memory is used in the DCT-domain decoder loop for computation and memory reduction which may lead to some drifting errors.