Video transcoding converts video bit streams from one coding format to other formats. The transcoding can consider syntax, bit rate, and resolution conversions. Transcoders can be used at the source or destination of videos, or in between, e.g., in video servers, network routers, and video receivers. Transcoders enable the delivery of videos to a variety of devices having different network connections or display capabilities, see U.S. Pat. No. 6,483,851, “System for network transcoding of multimedia data flow,” issued to Neogi on Nov. 19, 2002, U.S. Pat. No. 6,490,320, “Adaptable bitstream video delivery system,” issued to Vetro, et al. on Dec. 3, 2002, and U.S. Pat. No. 6,345,279, “Methods and apparatus for adapting multimedia content for client devices,” issued to Li, et al. on Feb. 5, 2002.
The above patents focus on higher-level system design issues. However, detailed information describing the transcoding of video is not provided. In particular, those patents do not disclose how quantization parameters and conversion modes for macroblocks are determined.
Recently, there is an increased demand for video transcoding with spatial resolution reduction. Such requirements come from high-definition TV (HDTV) broadcasting and DVD applications, etc. In order to display HDTV programs on standard definition TV (SDTV), or to record the HDTV on the DVD recorder, it is necessary to convert a high resolution HDTV video to a low resolution SDTV video. In addition, hand-held devices with small video displays and low bit rate wireless connections require video transcoding.
The reduction of spatial resolution has been described by Xin, et al., “An HDTV-to-SDTV spatial transcoder,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No. 11, November 2002, Yin, et al., “Drift compensation for reduced spatial resolution transcoding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No. 11, November 2002, Shanableh, et al., “Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats,” IEEE Transactions on Multimedia, Vol. 2, No. 2, June 2000, and Shen, et al., “Transcoder with arbitrarily resizing capability,” IEEE proc. ISCAS 2001.
FIG. 1 shows the basic structure and operation of a typical prior art video transcoder 100. The transcoder 100 includes a decoder 110, a downscale filter 120, and an encoder 130 connected serially to each other module. A macroblock mapper 140 is connected between the decoder and the encoder. An input video bitstream 101, with bit rate R1, is decoded 110 into YUV video frames. The decoded frames are then spatially downscaled 120 to lower resolution YUV frames. Concurrently, motion vectors and coding modes are extracted from the input bitstream by the MB mapper 140. The encoder 130 uses the extracted macroblock information to encode the filtered YUV frames into an output video stream 102 with a lower bit rate R2 and lower spatial resolution.
At the macroblock level, a variety of modes can be used to encode a video, depending on the coding standard. For example, in order to support interlaced video sequences, the MPEG-2 standard has several different macroblock coding modes, including intra mode, no motion compensation (MC) mode, frame/field motion compensation inter mode, forward/backward/interpolate inter mode, and frame/field DCT mode. As an advantage, the multiple modes provide better coding efficiencies due to their inherent adaptability.
However, the prior art either focuses on motion vector re-sampling or motion re-estimation for spatial resolution reduction, without considering the best coding mode. For efficiency, the encoding modes for the output video stream are usually based on the coding modes for the input video stream, using majority-voting. The resulting modes are certainly sub-optimal. Other criteria for making mode decision have also been described, but those coding modes are limited to intra and inter decision, with similar disadvantages.
Systems and methods for optimally selecting a macroblock coding mode based on a quantization scale selected for the macroblock are described in U.S. Pat. No. 6,037,987, “Apparatus and method for selecting a rate and distortion based coding mode for a coding system,” issued to Sethuraman on Mar. 14, 2000, U.S. Pat. No. 6,192,081, “Apparatus and method for selecting a coding mode in a block-based coding system,” issued to Chiang, et al. on Feb. 20, 2001, and Sun, et al., “MPEG coding performance improvement by jointly optimizing coding mode decisions and rate control,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, No. 3, June 1997.
FIG. 2 shows a typical prior art system and method 200 for jointly optimizing the coding mode and the quantizer. That system 200 basically uses a brute force, trial-and-error method. The system 200 includes a quantization selector 210, a mode selector 220, a MB predictor 230, a discrete cosine transform (DCT) 240, a quantizer 250, a variable length coder (VLC) 260, a cost function 270 to select an optimal quantization and mode 280. The optimal quantization and mode 280 are achieved by an iterative procedure for searching through a trellis to find a path that has a lowest cost. As the quantizer selector 210 changes its step size, e.g., 1 to 31, the mode selector 220 responds by selecting each mode for each macroblock, e.g., intra 221, no MC 222, MC frame 223, and MC field 224.
A macroblock level is predicted 230 in terms of a decoded picture type. Then, the forward DCT 240 is applied to each macroblock of a predictive residual signal to produce DCT coefficients. The DCT coefficients are quantized 250 with each step size in the quantization parameter set. The quantized DCT coefficients are entropy encoded using the VLC 260, and a bit rate 261 is recorded for later use. In parallel, a distortion calculation by means of mean-square-error (MSE) is performed over pixels in the macroblock resulting in a distortion value.
Next, the resulting bit rate 261 and distortion 251 are received into the rate-distortion module for cost evaluation 270. The rate-distortion function is constrained by a target frame budget imposed by a rate constraint Rpicture 271. The cost evaluation 270 is performed on each value q in the quantization parameter set. The quantization scale and coding mode for each macroblock with the lowest value are selected.
In the prior art system, if Q denotes the set of all admissible quantizers, and M denotes the set of all admissible coding modes, then the complexity of the prior art system is Q×M. Because a single loop for each quantizer value involves DCT transformation, quantization, distortion and bit count calculation for each macroblock, the double loop for joint mode decision and quantizer selection in the prior art makes the complexity extremely high.
Given the above prior art, there is a need to provide a new system and method for video transcoding with spatial resolution reduction, which achieves the optimal solution for coding mode decision and motion vector selection with less complexity.