The present invention relates generally to telecommunication techniques. More particularly, the invention provides a method and system for transcoding between hybrid video CODEC bitstreams. Merely by way of example, the invention has been applied to a telecommunication network environment, but it would be recognized that the invention has a much broader range of applicability.
As time progresses, telecommunication techniques have also improved. There are now several standards for coding audio and video signals across a communications link. These standards allow terminals to interoperate with other terminals that support the same sets of standards. Terminals that do not support a common standard can only interoperate if an additional device, a transcoder, is inserted between the devices. The transcoder translates the coded signal from one standard to another.                I frames are coded as still images and can be decoded in isolation from other frames.        P frames are coded as differences from the preceding I or P frame or frames to exploit similarities in the frames.        B frames are coded as differences from preceding and following frames to further exploit similarities between groups of frames.        
Core profile MPEG-4 supports I, P and B frames. Simple profile MPEG-4 and baseline H.263 support only I and P frames. MPEG-4 supports arbitrary frame sizes whereas baseline H.263 only supports limited set of frame sizes.
Some hybrid video codec standards such as the MPEG-4 video codec also supports “Not Coded” frames which contain no coded data after the frame header. Details of certain examples of standards are provided in more detail below.
Certain standards such as the H.261, H.263, H.264 and MPEG-4-video codecs both decompose source video frames into 16 by 16 picture element (pixel) macroblocks. The H.261, H.263 and MPEG-4-video codecs further subdivide each macroblock is further divided into six 8 by 8 pixel blocks. Four of the blocks correspond to the 16 by 16 pixel luminance values for the macroblock and the remaining two blocks to the sub-sampled chrominance components of the macroblock. The H.264 video codec subdivides each macroblock into twenty-four 4 by 4 pixel blocks, 16 for luminance and 8 for sub-sampled chrominance.
Hybrid video codecs generally all convert source macroblocks into encoded macroblocks using similar techniques. Each block is encoded by first taking a spatial transform then quantizing the transform coefficients. We will refer to this as transform encoding. The H.261, H.263 and MPEG-4-video codecs use the discrete cosine transform (DCT) at this stage. The H.264 video codec uses an integer transform.
The non-zero quantised transform coefficients are further encoded using run length and variable length coding. This second stage will be referred to as VLC (Variable Length Coding) encoding. The reverse processes will be referred to as VLC decoding and transform decoding respectively. Macroblocks can be coded in three ways;                “Intra coded” macroblocks have the pixel values copied directly from the source frame being coded.        “Inter coded” macroblocks have pixel values that are formed from the difference between pixel values in the current source frame and the pixel values in the reference frame. The values for the reference frame are derived by decoding the encoded data for a previously encoded frame. The area of the reference frame used when computing the difference is controlled by a motion vector or vectors that specify the displacement between the macroblock in the current frame and its best match in the reference frame. The motion vector(s) is transmitted along with the quantised coefficients for inter frames. If the difference in pixel values is sufficiently small, only the motion vectors need to be transmitted.        
Generally all the hybrid video codecs often have differences in the form of motion vectors they allow such as, the number of motion vectors per macroblock, the resolution of the vectors, the range of the vectors and whether the vectors are allowed to point outside the reference frame. The process of estimating motion vectors is termed “motion estimation”. It is one of the most computationally intensive parts of a hybrid video encoder.                “Not coded” macroblocks are macroblocks that have not changed significantly from the previous frame and no motion or coefficient data is transmitted for these macroblocks.        
The types of macroblocks contained in a given frame depend on the frame type. For the frame types of interest to this algorithm, the allowed macroblock types are as follows;                I frames can contain only Intra coded macroblocks.        P frames can contain Intra, Inter and “Not coded” macroblocks.        B frames contain only Inter or Not coded macroblocks. The Inter macroblocks in B frames can reference the preceding I or P frame and the following I or P frame (the following I or P frame is generally transmitted before the B frame but displayed after the B frame).        
Prior to transmitting the encoded data for the macroblocks, the data are further compressed using lossless variable length coding (VLC encoding).
Another area where hybrid video codecs differ is in their support for video frame sizes. MPEG-4 and H.264 support arbitrary frame sizes, with the restriction that the width and height as multiples of 16, whereas H.261 and baseline H.263 only supports limited set of frame sizes. Depending upon the type of hybrid video codecs, there can also be other limitations.
A conventional approach to transcoding is known as tandem transcoding. A tandem transcoder will often fully decode the incoming coded signal to produce the data in a raw (uncompressed) format then re-encode the raw data according to the desired target standard to produce the compressed signal. Although simple, a tandem video transcoder is considered a “brute-force” approach and consumes significant amount of computing resources. Another alternative to tandem transcoding includes the use of information in the motion vectors in the input bitstream to estimate the motion vectors for the output bitstream. Such an alternative approach also has limitations and is also considered a brute-force technique.
From the above, it is desirable to have improved ways of converting between different telecommunication formats in an efficient and cost effective manner.