A/V content is increasingly being transmitted over optical, wireless, and wired networks. Since these networks are characterized by different network bandwidth constraints, there is a need to represent A/V content by different bit rates resulting in varying subjective visual quality. Additional requirements on the compressed representation of A/V content are imposed by the screen size, computational capabilities, and memory constraints of an A/V terminal.
Therefore, A/V content stored in a compressed format, e.g., as defined by Moving Pictures Experts Group (“MPEG”), must be converted to, e.g., different bit rates, frame rates, screen sizes, and in accordance with varying decoding complexities and memory constraints of different A/V terminals.
To avoid the need for storing multiple compressed representations of the same A/V content for different network bandwidths and different A/V terminals, A/V content stored in a compressed MPEG format may be transcoded to a different MPEG format.
With respect to video transcoding, reference is made to the following:                W009838800A1: O. H. Werner, N. D. Wells, M. J. Knee: Digital Compression Encoding with improved quantization, 1999, proposes an adaptive quantization scheme;        U.S. Pat. No. 5,870,146: Zhu; Qin-Fan: Device and method for digital video transcoding, 1999;        W009929113A1: Nilsson, Michael, Erling; Ghanbari, Mohammed: Transcoding, 1999;        U.S. Pat. No. 5,805,224: Keesman; Gerrit J, Van Otterloo; Petrus J.: Method and Device for Transcoding Video Signal, 1998;        W009943162A L Golin, Stuart, Jay: Motion vector extrapolation for transcoding video sequences, 1999;        U.S. Pat. No. 5,838,664: Polomski; Mark D.: Video teleconferencing system with digital transcoding, 1998;        W009957673A2: Balliol, Nicolas: Transcoding of a data stream, 1999;        U.S. Pat. No. 5,808,570: Bakhmutsky; Michael: Device and Method for pair-matching Huffman-Transcoding and high performance variable length decoder with two-word bitstream segmentation which utilizes the same, 1998;        W009905870A2: Lemaguet, Yann: Method of Switching between Video Sequences and corresponding Device, 1999; and        W009923560A1: LUDWIG, Lester; BROWN, William; Y U L, Inn, J.; VUONG, Anh, T., VANDERLIPPE, Richard; BURNETT, Gerald; LAUWERS, Chris; L U I, Richard; APPLEBAUM, Daniel: Scalable networked multimedia system and application, 1999.        
However, none of these patents on video transcoding disclose or suggest using transcoding hints metadata information to facilitate A/V transcoding.
The Society of Motion Picture and Television (“SMPTE”) proposed a standard for Television on MPEG-2 Video Recoding Data Set (327M-2000), which provides for re-encoding metadata using 256 bits for every macroblock of the source format. However, this extraction and representation of transcoding hints metadata has several disadvantages. For example, according to the proposed standard, transcoding hints metadata (such as GOP structure, quantizer settings, motion vectors, etc.) is extracted for every single frame and macroblock of the A/V source content. This method offers the advantage of offering detailed and content adaptive transcoding hints and facilitates transcoding while widely preserving The subjective A/V duality. However, the size of the transcoding hints metadata is very large. In one specific implementation of the proposed standard, 256 bits of transcoding hints metadata are stored per macroblock of MPEG video. This large amount of transcoding hints metadata is not feasible for, say, broadcast distribution to a local (e.g., home) A/V content server. Consequently, the proposed standard on transcoding hints metadata is limited to broadcast studio applications.
Another technique for transcoding hints metadata extraction and representation includes collecting general transcoding hints metadata for the transcoding of compressed A/V source content with a specific bit rate to another compressed format and bit rate. However, this technique is disadvantageous in not taking the characteristic properties of the transcoded content into account. For example, in the source content, the A/V characteristics may change from an A/V segment with limited amount of motion and few details (e.g., a news anchor scene) to another A/V segment depicting fast motion and numerous details (e.g., a sports event scene). According to this technique, misleading transcoding hints metadata, which would not suitably represent the different characteristics of both video segments, would be selected and, therefore, result in poor A/V quality and faulty bit rate allocation.