Digital video content is typically generated to target a specific data format. A video data format generally conforms to a specific video coding standard or a proprietary coding algorithm, with a specific bit rate, spatial resolution, frame rate, etc. Such coding standards include MPEG-2 and WINDOWS Media Video (WMV). Most existing digital video contents are coded according to the MPEG-2 data format. WMV is widely accepted as a qualified codec in the streaming realm, being widely deployed throughout the Internet, adopted by the HD-DVD consortium, and currently being considered as a SMPTE standard. Different video coding standards provide varying compression capabilities and visual quality.
Transcoding refers to the general process of converting one compressed bitstream into another compressed one. To match a device's capabilities and distribution networks, it is often desirable to convert a bitstream in one coding format to another coding format such as from MPEG-2 to WMV, to H.264, or even to a scalable format. Transcoding may also be utilized to achieve some specific functionality such as VCR-like functionality, logo insertion, or enhanced error resilience capability of the bitstream for transmission over wireless channels.
FIG. 1 shows a conventional Cascaded Pixel-Domain Transcoder (CPDT) system, which cascades a front-end decoder to decode an input bitstream with an encoder that generates a new bitstream with a different coding parameter set or in new format. One shortcoming of this conventional transcoding architecture is that its complexity typically presents an obstacle for practical deployment. As a result, the CPDT transcoding architecture of FIG. 1 is typically used as a performance benchmark for improved schemes.
FIG. 2 shows a conventional cascaded DCT-domain transcoder (CDDT) architecture, simplifying the CPDT architecture of FIG. 1. The system of FIG. 2 limits functionality to spatial/temporal resolution downscaling and coding parameter changes. CDDT eliminates the DCT/IDCT processes implemented by the CPDT transcoder of FIG. 1. Yet, CDDT performs MC in the DCT domain, which is typically a time-consuming and computationally expensive operation. This is because the DCT blocks are often overlapped with MC blocks. As a result, the CDDT architecture typically needs to apply complex and computationally expensive floating-point matrix operations in order to perform MC in the DCT domain. Additionally, motion vector (MV) refinement is typically infeasible utilizing the CDDT architecture.