Digital video communication systems may employ one, and sometimes more, digital video coding formats for the encoding, storage, and transport of video. For example, in traditional videoconferencing systems both H.261 and H.263 video coding standards are used, whereas in digital television systems MPEG-2/H.262 video coding is utilized. More recent systems use H.264 video both in videoconferencing and broadcast settings.
The need for using different coding formats in video communication systems is a direct result of the different operating assumptions that exist within different application domains. It is noted that even if the same coding format is used in two different application domains, as is the case with the use of H.264 in videoconferencing, mobile, and broadcast TV applications, the specific way that the format is used in the two different application domains may be very different. As a result, content created in one application domain may not be directly decodable by systems of the another domain. Signal modifications are required. Even in instances where exactly the same coding format is used, it is often the case that the bit rate of the coded video signal is higher than what can be used in a particular application context, and must therefore be reduced. Transcoding can also be employed when a rate distortion improvement in the overall system can be obtained.
In practical applications, the need for content interoperability creates several instances where it is desirable to efficiently convert between different video coding formats as well as between different parameter settings (such as bit rate) of the same format. These techniques are commonly collectively referred to as ‘transcoding’ techniques.
One example of the need for bitstream format conversion is support for legacy codecs in an application that uses a new codec. Transcoding can be employed for format conversion, e.g., when a participant connects to a video conference using a bitstream format that cannot be decoded by another participant.
Transcoding within a given coding format may be desirable to perform a change in spatial resolution, in order to accommodate the available display size, bit rate, processing power, or power consumption of a receiver, or general coding efficiency considerations. Similarly, it may also be desirable to change the temporal resolution to accommodate available bit rate, processing power, and power consumption of a receiver, or for general coding efficiency considerations. Another conversion typically desired is to change the bitstream size, or bit rate, to accommodate receiver capabilities in terms of bit rate, processing power, or power consumption.
Yet another example for the need for transcoding is for modifying bitstream characteristics, e.g., for error resilience or coding efficiency. Modifications of the bitstream may include, for example, change of coding decisions such as Intra macroblocks.
Transcoding may also involve a combination of one or more of the aforementioned changes.
Transcoding techniques for standard video codecs have been developed to cater to specific application scenarios, for example, between MPEG-2 and H.264 to allow conversion of broadcast TV video to a format suitable for IP-TV and mobile TV applications. These transcoding techniques are directed to video coded using existing single-layer coding techniques.
In addition to traditional, single-layer codecs, layered or scalable coding is available for video coding. Scalable coding is used to generate two or more “scaled” bitstreams collectively representing a given video signal in a bit rate efficient manner. Scalability can be provided in a number of different dimensions, namely temporally, spatially, and quality (also referred to as “Signal-to-Noise Ratio” (SNR) scalability or fidelity scalability). Depending on the codec's structure, any combination of spatial resolutions and frame rates may be obtainable from the codec bitstream. For example, a video signal may be scalably coded in different layers at CIF and QCIF resolutions, and at frame rates of 7.5, 15, and 30 frames per second (fps). The bits corresponding to the different layers can be transmitted as separate bitstreams (i.e., one stream per layer) or they can be multiplexed together in one or more bitstreams. For convenience in the description herein, the coded bits corresponding to a given layer may be referred to as that layer's bitstream, even if the various layers are multiplexed and transmitted in a single bitstream.
Codecs specifically designed to offer scalability features include, for example, MPEG-2 (ISO/IEC 13818-2, also known as ITU-T H.262) and the currently developed SVC (known as ITU-T H.264 Annex G or MPEG-4 Part 10 SVC). Scalable coding techniques specifically designed for video communication are described in commonly assigned international patent application No. PCT/US06/028365, “SYSTEM AND METHOD FOR SCALABLE AND LOW-DELAY VIDEOCONFERENCING USING SCALABLE VIDEO CODING”. It is noted that even codecs that are not specifically designed to be scalable can exhibit scalability characteristics in the temporal dimension. For example, consider an MPEG-2 Main Profile codec, a non-scalable codec, which is used in DVDs and digital TV environments. Further, assume that the codec is operated at 30 fps and that a group of pictures (GOP) structure of IBBPBBPBBPBBPBB (period N=15 frames) is used. By sequential elimination of the B pictures, followed by elimination of the P pictures, it is possible to derive a total of three temporal resolutions: 30 fps (all picture types included), 10 fps (I and P only), and 2 fps (I only). The sequential elimination process results in a decodable bitstream because the MPEG-2 Main Profile codec is designed so that coding of the P pictures does not rely on the B pictures, and similarly coding of the I pictures does not rely on other P or B pictures. In the following, single-layer codecs with temporal scalability features are considered to be a special case of scalable video coding, and are thus included in the term scalable video coding, unless explicitly indicated otherwise.
Scalable codecs typically have a pyramidal bitstream structure in which one of the constituent bitstreams (called the “base layer”) is essential in recovering the original medium at some basic quality. Use of one or more of the remaining bitstream(s) (called “the enhancement layer(s)”) along with the base layer increases the quality of the recovered medium.
Scalable video coding is a particularly effective coding technique for interactive video communication applications such as multipoint videoconferencing. Commonly assigned International Patent Applications No. PCT/US06/28366 and No. PCT/US06/62569 describe a “Scalable Video Communication Server” (SVCS) and “Compositing Scalable Video Communication Server” (CSVCS) architecture, respectively, that serve the same purpose as that of a traditional Multipoint Control Unit (MCU), but with significantly reduced complexity and improved functionality. Similarly, commonly assigned International Patent Applications No. PCT/US06/061815 and PCT/US07/63335 describe mechanisms for improving error resilience, random access, and rate control in such systems.
Transcoding in scalable video shares several of the characteristics of single-layer transcoding, but has additional characteristics that are unique to scalable video applications or needs. Scenarios where transcoding between scalable video coding formats in a videoconferencing setting is needed may include:                A participant in a multi-party videoconference requests a video signal characteristic that cannot be efficiently represented in the particular scalable video coding format used together with the video signal characteristics of the bitstreams that the other conference participants request. An example is the use of the CSVCS in conjunction with a request for a video resolution that is slightly different then other requested video resolutions. Transcoding is needed to provide the slightly different video resolution. A participant has a transmission channel that is much more error prone than the transmission channels of other participants. The transcoding is needed to insert more slices and intra macroblocks to compensate for the increased error.        Transcoding to a related single-layer format to support a legacy endpoint (e.g., H.264 SVC to AVC).        Transcoding to an unrelated single-layer format to support a legacy endpoint (e.g., H.264 SVC to any of H.263, MPEG2, H.261, MPEG-4, or any other video bitstream format except H.264 and SVC).        
With reference to the latter two scenarios mentioned above, it is noted that distinction between related and unrelated single-layer coding formats compared to the scalable video coding format is significant. For example, SVC is an extension of H.264 AVC and therefore shares many common elements such as high-level syntax, motion compensation, transform coding, and deblocking filter. Consequently, conversion from and to these two formats is easier to perform in a computationally efficient way.
Consideration is now being given to transcoding in video communications systems that use scalable video coding. Attention is directed to techniques for transcoding between scalable and non-scalable bitstreams, in both directions. The desired transcoding techniques will have minimal impact on the quality of the video signal and have high computational efficiency.