Multimedia applications need to handle videos or sequences of images, each image comprising one or more macroblocks of pixels. The diversity of multimedia applications and terminals receiving multimedia content inevitably causes interoperability problems. For instance, current mobile terminals support different video encoding standards, such as H.263, MPEG-4 (Moving Pictures Experts Group) described in ISO/IEC 14496-2, “Information technology—Coding of audio-visual objects—Part 2: Visual,” second edition, December 2001, and H.264/AVC described in ISO/IEC 14496-10 AVC and ITU-T rec. H.264, “Advanced video coding for generic audiovisual services,” March 2005. The transcoding of video content to a specific resolution, encoding standard, and bit rate constraints has become a necessity in order to ensure the success of evolving multimedia communications. The MPEG-4 visual simple profile (VSP) is widely used in today's multimedia services, including mobile videoconferencing, multimedia message service (MMS), and streaming within the scope of 3GPP/3GPP2. This is described in 3GPP TS 26.234 v7.7.0, “Packet-switched Streaming Services (PSS); Protocols and codecs (Release 7),” March 2009, 3GPP TS 26.140 v7.1.0, “Multimedia Messaging Service (MMS); Media formats and codecs (Release 7),” June 2007, 3GPP2 C.S0045-A, “Multimedia Messaging Service (MMS) Media Format and Codecs for cdma200 Spread Spectrum Systems,” version 1.0, March 2006 and 3GPP2 C.S0046-0, “3G Multimedia Streaming Services,” version 1.0, February 2006.
The more recent H.264/AVC encoding standard provides significant improvements in compression efficiency and is expected to replace the earlier encoding standards, thereby making transcoding from MPEG-4 to H.264 inevitable.
H.264 encoding is especially complex, because of its more sophisticated coding tools. H.264 uses several encoding block modes: 4 inter modes (16×16, 16×8, 8×16, and 8×8), 4 sub-modes (8×8, 8×4, 4×8, and 4×4), a SKIP mode, and two intra prediction modes (16×16 and 4×4), a lossless mode, and PCM. To determine the best encoding block mode, H.264 uses rate distortion optimization (RDO). Therefore, for several candidate encoding modes for encoding, it will perform motion estimation (ME) and motion compensation (MC), up to 41 ME operations at quarter-pixel accuracy for a single macroblock (MB). The macroblock in video compression, represents a 16×16 block of pixels. Each macroblock contains 4 Y (luminance) blocks (of 8×8 pixels), 1 Cb (blue color difference) block, 1 Cr (red color difference) block often in 4:2:0 sampling mode (where color is subsampled by a factor of 2 horizontally and vertically with respect to the luminance). Each macroblock may have one or more partitions, the encoding block mode for the MB indicating the size of partitions within the MB.
Several studies have investigated the problem of transcoding of a video comprising a sequence of input images encoded in a first format to a sequence of output images encoded in a second format in general, and the transcoding of the sequence of input images encoded in MPEG-4 to a sequence of output images encoded in H.264 in particular. The cascade transcoding approach includes steps of fully decoding the MPEG-4 video bitstream to the spatial (pixel) domain and then re-encoding it according to the H.264 specification. The best video quality has been reached with this type of transcoding. Unfortunately, it has a high computational complexity, which is not always suitable for real-time applications.
Several methods have been proposed to reduce this computational complexity of transcoding. Examples include the paper by B. Shen, “From 8-tap DCT to 4-tap integer-transform for MPEG-4 to H.264/AVC transcoding,” IEEE international conference on image processing, Vol. 1, pp. 115-118, October 2004, by Y. K. Lee, S. S. Lee and Y. L. Lee, “MPEG-4 to H.264 transcoding using macroblock statistics,” IEEE international conference on multimedia and expo, pp. 57-60, July 2006 and the paper by Y. Liang, X. Wei, I. Ahmad and V. Swaminathan, “MPEG-4 to H.264/AVC transcoding,” The International Wireless Communications and Mobile Computing Conference, pp. 689-693, August 2007. Other studies related to this issue are described in the following set of papers. These include the paper by T. N. Dinh, J. Yoo, S. Park, G. Lee, T. Y. Chang and H. J. Cho, “Reducing spatial resolution for MPEG-4/H.264 transcoding with efficient motion reusing,” IEEE international conference on computer and information technology, pp. 577-580, October 2007, the paper by S. E. Kim, J. K. Han and J. G. Kim, “Efficient motion estimation algorithm for MPEG-4 to H.264 transcoder,” IEEE international conference on image processing, Vol. 3, pp. 659-702, September 2005, the paper by T. D. Nguyen, G. S. Lee, J. Y. Chang and H. J. Cho, “Efficient MPEG-4 to H.264/AVC transcoding with spatial downscaling,” ETRI, Vol. 29, pp. 826-828, December 2007 and the paper by A. Vetro, C. Christopoulos, and H. Sun, “Video transcoding architectures and techniques: an overview,” IEEE Signal Processing Magazine, 20(2):18-29, 2003. The most efficient of these methods exploit the information available from the MPEG-4 decoder used during the transcoding to reduce the number of block modes to evaluate, thereby reducing ME complexity. In the paper by Lee et al., the authors exploit the frequency distribution of the H.264 block modes for a given MPEG-4 block mode in order to derive a table for obtaining transcoding block modes for MPEG-4 to H.264 transcoding. An example of such a table, Table 100, is presented in FIG. 1. Please note that the column header “MPEG-4 Coding modes” corresponds to the encoding block mode used for the input MBs whereas the row header “H.264 coding modes” corresponds to the transcoding block modes used in the transcoding. The method of Lee uses the table to identify the most probable H.264 coding modes for each given MPEG-4 coding mode. So instead of checking all H.264 coding modes they only check the most probable ones.
In the paper by Liang et al., an arbitrary mapping between MPEG-4 block modes and H.264 candidate transcoding block modes is presented without much justification, for both intra and inter blocks. Motion vectors (MVs) are either directly reused (in 16×16 transcoding block mode) or become the starting points for ME (in 16×8 and 8×16 transcoding block modes, for instance). They obtain very good speed-ups, but the transcoded image quality is degraded by 1 to 2 dB, which may be unacceptable in some applications. Techniques described in the paper by Y.-K. Lee and Y.-L. Lee. “MPEG-4 to H.264 transcoding”, IEEE TENCON, November 2005, and in the paper by J. Bialkowski, M. Barkowsky and A. Kaup, “Overview of low-complexity video transcoding from H.263 to H.264”. IEEE International Conference on Multimedia and Expo (ICME), pp. 49-52, July 2006 reduce the number of candidate block modes to be tested but lack the necessary efficiency and require further improvement.
Therefore there is a need in the industry for developing an improved method and system for video transcoding to avoid or mitigate the above-mentioned drawbacks of the prior art.