In video communication, e.g., video conferencing, Multipoint Control Units (xe2x80x9cMCU""sxe2x80x9d) serve as switches and conference builders for the network. The MCU""s receive multiple audio/video streams from the various users"" terminals, or codecs, and transmit to the various users"" terminals audio/video streams that correspond to the desired signal at the users"" stations. In some cases, where the MCU serves as a switchboard, the transmitted stream to the end terminal is a simple stream from a single other user. In other cases, it is a combined xe2x80x9cconferencexe2x80x9d stream composed of a combination of several users"" streams.
An important function of the MCU is to translate or manipulate the input streams into the desired output streams from all and to all codecs. One aspect of this xe2x80x9ctranslationxe2x80x9d is a modification of the bit-rate between the original stream and the output stream. This rate matching modification can be achieved, for example, by changing the frame rate, the spatial resolution, or the quantization accuracy of the corresponding video. The output bit-rate, and thus the modified factor used to achieve the output bit rate, can be different for different users, even for the same input stream. For instance, in a four party conference, one of the parties may be operating at 128 Kbps, another at 256 Kbps, and two others at T1. Each party needs to receive the transmission at the appropriate bit rate. The same principles apply to xe2x80x9ctranslation,xe2x80x9d or transcoding, between parameters that vary between codecs, e.g., different coding standards like H.261/H263; different input resolutions; and different maximal frame rates in the input streams.
Another use of the MCU can be to construct an output stream that combines several input streams. This option, sometimes called xe2x80x9ccompositingxe2x80x9d or xe2x80x9ccontinuous presence,xe2x80x9d allows a user at a remote terminal to observe, simultaneously, several other participants in the conference. The choice of these participants can vary among different users at different remote terminals of the conference. In this situation, the amount of bits allocated to each participant can also vary, and may depend on the on screen activity of the users, on the specific resolution given to the participant, or some other criterion.
All of this elaborate processing, e.g., transcoding and continuous presence processing, must be done under the constraint that the input streams are already compressed by a known compression method, usually based on a standard like ITU""s H.261 or H.263. These standards, as well as other video compression standards like MPEG, are generally based on a Discrete Cosine Transform (xe2x80x9cDCTxe2x80x9d) process wherein the blocks of the image (video frame) are transformed, and the resulting transform coefficients are quantized and coded.
One prior art method first decompresses the video streams; performs the required combination, bridging and image construction; and finally re-compresses the video streams for transmission. This method requires high computation power, leads to degradation in the resulting video quality and suffers from large propagation delay. One of the most computation intensive portions of the prior art methods is the encoding portion of the operation where such things as motion vectors and DCT coefficients have to be generated so as to take advantage of spatial and temporal redundacies. For instance, to take advantage of spatial redundancies in the video picture, the DCT function can be perfomed. To generate DCT coefficients, each frame of the picture is broken into blocks and the discrete cosine transform function is performed upon each block. In order to take advantage of temporal redundancies, motion vectors can be generated. To generate motion vectors, consecutive frames are compared to each other in an attempt to discern pattern movement from one frame to the next. As would be expected, these computations require a great deal of computing power.
In order to reduce computation complexity and increase quality, others have searched for methods of performing such operations in a more efficient manner. Proposals have included operating in the transform domain on motion compensated, DCT compressed video signals by removing the motion compensation portion and compositing in the DCT transform domain.
Therefore, a method is needed for performing the xe2x80x9ctranslationxe2x80x9d operations of an MCU, such as modifying bit rates, frame rates, and compression algorithms in an efficient manner that reduces propagation delays, degradation in signal quality, video bandwidth use within the MCU and computational complexity.
The present invention relates to an improved method of processing multimedia/video data in an MCU or other digital video processing device (VPD). By reusing information embedded in a compressed video stream received from a video source, the VPD can improve the quality and reduce the total computations needed to process the video data before sending it to the destination. More specifically, the present invention operates to manipulate compressed digital video from several compressed digital video sources. A video input module receives compressed video input data from a video source. A generalized decoder within the video input module decodes the compressed video input data and generates a primary video data stream. The generalized decoder also processes the compressed video input data and the primary video data stream to generate a secondary data stream. A video output module, which includes a rate control unit and a generalized encoder, receives the primary video data stream and the secondary data stream from at least one input module. The generalized encoder, in communication with the rate control unit, receives the primary video data from one or more input modules and encodes the primary video data into combined compressed video output data. The use of the secondary data stream by the output module improves the speed of encoding and the quality of the compressed video data.