The present invention relates to compression of multimedia data and, in particular, to a video transcoder that allows a generic MPEG-4 decoder to decode MPEG-2 bitstreams. Temporal and spatial size conversion (downscaling) are also provided.
The following acronyms and terms are used:
CBPxe2x80x94Coded Block Pattern
DCTxe2x80x94Discrete Cosine Transform
DTVxe2x80x94Digital Television
DVDxe2x80x94Digital Video Disc
HDTVxe2x80x94High Definition Television
FLCxe2x80x94Fixed Length Coding
IPxe2x80x94Internet Protocol
MBxe2x80x94Macroblock
MExe2x80x94Motion Estimation
MLxe2x80x94Main Level
MPxe2x80x94Main Profile
MPSxe2x80x94MPEG-2 Program Stream
MTSxe2x80x94MPEG-2 Transport Stream
MVxe2x80x94Motion Vector
QPxe2x80x94quantization parameter
PMVxe2x80x94Prediction Motion Vector
RTPxe2x80x94Real-Time Transport Protocol (RFC 1889)
SDTVxe2x80x94Standard Definition Television
SIFxe2x80x94Standard Intermediate Format
SVCDxe2x80x94Super Video Compact Disc
VLCxe2x80x94Variable Length Coding
VLDxe2x80x94Variable Length Decoding
VOPxe2x80x94Video Object Plane
MPEG-4, the multimedia coding standard, provides a rich functionality to support various applications, including Internet applications such as streaming media, advertising, interactive gaming, virtual traveling, etc. Streaming video over the Internet (multicast), which is expected to be among the most popular application for the Internet, is also well-suited for use with the MPEG-4 visual standard (ISO/IEC 14496-2 Final Draft of International Standard (MPEG-4), xe2x80x9cInformation Technologyxe2x80x94Generic coding of audio-visual objects, Part 2: visual,xe2x80x9d December 1998).
MPEG-4 visual handles both synthetic and natural video, and accommodates several visual object types, such as video, face, and mesh objects. MPEG-4 visual also allows coding of an arbitrarily shaped object so that multiple objects can be shown or manipulated in a scene as desired by a user. Moreover, MPEG-4 visual is very flexible in terms of coding and display configurations by including enhanced features such as multiple auxiliary (alpha) planes, variable frame rate, and geometrical transformations (sprites).
However, the majority of the video material (e.g., movies, sporting vents, concerts, and the like) which is expected to be the target of streaming video is already compressed by the MPEG-2 system and stored on storage media such as DVDs, computer memories (e.g., server hard disks), and the like. The MPEG-2 System specification (ISO/IEC 13818-2 International Standard (MPEG-2), xe2x80x9cInformation Technologyxe2x80x94Generic coding of Moving Pictures and Associated Audio: Part 2xe2x80x94Video,xe2x80x9d 1995) defines two system stream formats: the MPEG-2 Transport Stream (MTS) and the MPEG-2 Program Stream (MPS). The MTS is tailored for communicating or storing one or more programs of MPEG-2 compressed data and also other data in relatively error-prone environments. One typical application of MTS is DTV. The MPS is tailored for relatively error-free environments. The popular applications include DVD and SVCD.
Attempts to address this issue have been unsatisfactory to date. For example, the MPEG-4 studio profile (O. Sunohara and Y. Yagasaki, xe2x80x9cThe draft of MPEG-4 Studio Profile Amendment Working Draft 2.0,xe2x80x9d ISO/IEC JTC1/SC29/WG11 MPEG99/5135, October 1999) has proposed a MPEG-2 to MPEG-4 transcoder, but that process is not applicable to the other MPEG-4 version 1 profiles, which include the Natural Visual profiles (Simple, Simple Scaleable, Core, Main, N-Bit), Synthetic Visual profiles (Scaleable Texture, Simple Face Animation), and Synthetic/Natural Hybrid Visual (Hybrid, Basic Animated Texture). The studio profile is not applicable to the Main Profile of MPEG-4 version 1 since it modifies the syntax, and the decoder process is incompatible with the rest of the MPEG-4 version 1 profiles.
The MPEG standards designate several sets of constrained parameters using a two-dimensional ranking order. One of the dimensions, called the xe2x80x9cprofilexe2x80x9d series, specifies the coding features supported. The other dimension, called xe2x80x9clevelxe2x80x9d, specifies the picture resolutions, bit rates, and so forth, that can be accommodated.
For MPEG-2, the Main Profile at Main Level, or MP@ML, supports a 4:2:0 color subsampling ratio, and I, P and B pictures. The Simple Profile is similar to the Main Profile but has no B-pictures. The Main Level is defined for ITU-R 601 video, while the Simple Level is defined for SIF video.
Similarly, for MPEG-4, the Simple Profile contains SIF progressive video (and has no B-VOPs or interlaced video). The Main Profile allows B-VOPs and interlaced video.
Accordingly, it would be desirable to achieve interoperability among different types of end-systems by the use of MPEG-2 video to MPEG-4 video transcoding and/or MPEG-4-video to MPEG-2-video transcoding. The different types of end-systems that should be accommodated include:
Transmitting Interworking Unit (TIU): Receives MPEG-2 video from a native MTS (or MPS) system and transcodes to MPEG-4 video and distributes over packet networks using a native RTP-based system layer (such as an IP-based internetwork). Examples include a real-time encoder, a MTS satellite link to Internet, and a video server with MPS-encoded source material.
Receiving Interworking Unit (RIU): Receives MPEG-4 video in real time from an RTP-based network and then transcodes to MPEG-2 video (if possible) and forwards to a native MTS (or MPS) environment. Examples include an Internet-based video server to MTS-based cable distribution plant.
Transmitting Internet End-System (TIES): Transmits MPEG-2 or MPEG-4 video generated or stored within the Internet end-system itself, or received from internet-based computer networks. Examples include a video server.
Receiving Internet End-System (RIES): Receives MPEG-2 or MPEG-4 video over an RTP-based internet for consumption at the Internet end-system or forwarding to a traditional computer network. Examples include a desktop PC or workstation viewing a training video.
It would be desirable to determine similarities and differences between MPEG-2 and MPEG-4 systems, and provide transcoder architectures which yield a low complexity and small error.
The transcoder architectures should be provided for systems where B-frames are enabled (e.g., main profile), as well as a simplified architecture for when B-frames are not used (simple profile).
Format (MPEG-2 to MPEG-4) and/or size transcoding should be provided.
It would also be desirable to provide an efficient mapping from the MPEG-2 to MPEG-4 syntax, including a mapping of headers.
The system should include size transcoding, including spatial and temporal transcoding.
The system should allow size conversion at the input bitstream or output bitstream of a transcoder.
The size transcoder should convert a bitstream of ITU-R 601 interlaced video coded with MPEG-2 MP@ML into a simple profile MPEG-4 bitstream which contains SIF progressive video suitable, e.g., for a streaming video application.
The system should provide an output bitstream that can fit in the practical bandwidth for a streaming video application (e.g., less than 1 Mbps).
The present invention provides a system having the above and other advantages.
The invention relates to format transcoding (MPEG-2 to MPEG-4) and size (spatial and temporal) transcoding.
A proposed transcoder includes size conversion, although these parameters can be transcoded either at the input bitstream or the output bitstream. However, it is more efficient to include all kinds of transcoding into the product version of a transcoder to reduce the complexity since the transcoders share processing elements with each other (such as a bitstream reader).
The invention addresses the most important requirements for a transcoder, e.g., the complexity of the system and the loss generated by the process.
In one embodiment, a proposed front-to-back transcoder architecture reduces complexity because there is no need to perform motion compensation.
In a particular embodiment, the transcoder can use variable 5-bit QP representation, and eliminates AC/DC prediction and the nonlinear DC scaler.
The invention is alternatively useful for rate control and resizing.
A particular method for transcoding a pre-compressed input bitstream that is provided in a first video coding format includes the steps of: recovering header information of the input bitstream; providing corresponding header information in a second, different video coding format; partially decompressing the input bitstream to provide partially decompressed data; and re-compressing the partially decompressed data in accordance with the header information in the second format to provide the output bitstream.
A method for performing 2:1 downscaling on video data includes the steps of: forming at least one input matrix of Nxc3x97N (e.g., N=16) Discrete Cosine Transform (DCT) coefficients from the video data by combining four N/2xc3x97N/2 field-mode DCT blocks; performing vertical downsampling and de-interlacing to the input matrix to obtain two N/2xc3x97N/2 frame-mode DCT blocks; forming an Nxc3x97N/2 input matrix from the two frame-mode DCT blocks; and performing horizontal downsampling to the Nxc3x97N/2 matrix to obtain one N/2xc3x97N/2 frame-mode DCT block.
Preferably, the vertical and horizontal downsampling use respective sparse downsampling matrixes. In particular, a vertical downsampling matrix of 0.5[I8 I8] may be used, where I8 is an 8xc3x978 identity matrix. This is essentially vertical pixel averaging. A horizontal downsampling matrix composed of odd xe2x80x9cOxe2x80x9d and even xe2x80x9cExe2x80x9d matrices may be used.
Corresponding apparatuses are also presented.