1. Field of Invention
This invention relates to the field of video communication and, more particularly, to providing real-time video conferencing while minimizing transmission and processing delays.
2. Description of Background Art
As the geographical domain in which companies conduct business continues to expand, video teleconferencing technology attempts to bring the world closer together. But, as with most user based technologies, users can be very critical and demanding with regards to the quality of the technology and the comfort of the user interface. One of the main complaints with regards to video communication technology is the delay that occurs between video streams sent between participants of the video conference. The delay tends to decrease the quality of the video communication experience as participants inadvertently start talking at the same time, yet are several words into the process before the conflict is realized. The greater the delay resulting from the processing or the transmission of data, the more difficult communication is between the participants.
There are several factors that contribute to the delay in the video stream. One such factor arises when the encoding of the video streams is performed at user or participant terminals where the video stream is created. Another such factor arises simply due to transmission delays that accrue when transmitting the encoded video stream through a network to a Video Communication Control Unit (VCCU), like but not limited to, a Multipoint Control Unit (MCU), a Multimedia Gateway, etc. Typically, a VCCU serves as a switchboard and/or conference builder for the network. In operation, the VCCU receives and transmits coded video streams to and from various user terminals or codecs.
Another contribution to the delay of the video stream is due to the processing performed on the encoded video stream within the VCCU. Once the VCCU completes its processing of the video stream, additional delays are incurred while the processed video stream is transmitted through the network to target user or participant terminals. After the participant terminals receive the video stream, additional delays are caused by the decoding of the encoded video streams back to normal video.
The VCCU may be used in several modes, such as video switching, transcoding, and continuous presence. In video switching, the VCCU serves as a switchboard, and the video stream is directly transmitted from a source terminal to a target terminal. In video switching operation, the input stream is passed through the VCCU, and the VCCU is not required to perform any video processing. Although video switching achieves a reduction in the delay of the video stream, it is not adequate to solve the present problem because, in many situations, video switching cannot be utilized. Two such situations arise where transcoding is required or a continuous presence mode of operation is provided.
Transcoding of the video stream is required when the input stream does not match requirements of the target user terminal (such as bit rate, frame rate, frame resolution, compression algorithm, etc.). Transmission of the video streams in this mode requires video processing, which may result in delays due to the required processing time, and thus, a less than optimal video conference.
Continuous presence (“CP”), involves video mixing of the video data from various source user terminals, thus resulting in a need for video processing. Both transcoding and the continuous presence mode cause delays in the delivery of video streams within a video communication system. Thus, it is evident that there is a need in the art for a technique to eliminate or alleviate the delays in the video stream to improve the video communication experience.
The elaborate processing required in transcoding and continuous presence operation must be done under the constraint that the input streams are already compressed by a known compression method based on dividing the video stream into smaller units, such as GOP (group of pictures), pictures, frames, slices, GOB (Group Of Blocks), macro blocks (MB) and blocks as described in standards such as the H.261, H.263, and MPEG standards.
A typical VCCU, like the MGC-100 manufactured by Polycom Networks Systems, contains a number of decoders and encoders. Each decoder receives a compressed stream of a known compression format, and decodes or uncompresses the compressed stream. The uncompressed frames are then scaled and rearranged to form a desired output layout. The resulting frame is then appropriately compressed by the encoders and transmitted to the desired target terminals.
FIG. 1 is a block diagram illustrating a typical embodiment of video ports within a VCCU 100. Two video ports are illustrated by way of example and for convenience of presentation; however, those skilled in the art will realize that the VCCU 100 can have many such video ports. The VCCU 100 receives compressed video streams from various terminals and places the compressed video streams onto a backplane bus 140. Each video port 130 within the VCCU 100 is dedicated to one end terminal. Uncompressed video is shared through a dedicated video bus 150, capable of transferring high bandwidth video streams at given maximum resolution under the maximum frame rate.
The description of the present invention refers to a terminal in several names like: end terminal, terminal, end-point, endpoint, and end user terminal. In general, a terminal is an endpoint on the network that provides for real-time, two-way communications with another terminal, Gateway, or Multipoint Control Unit. This communication consists of control, indications, audio, moving color video pictures, and/or data between the two terminals. A terminal may provide speech only; speech and data; speech and video; or speech, data, and video.
Once a compressed video stream from an end user terminal is placed onto the backplane bus 140, the video stream begins to accumulate in an input buffer 125 before being provided to a decoder 120. The decoder 120 converts the compressed video stream into uncompressed frames, and the uncompressed frames are placed into input triple frame memory 123. The input triple frame memory 123 consists of three frame buffers. Working in a cyclic mode, one buffer is needed for the frame constructed by the decoder 120. The second buffer is used for transmission over the video bus 150. When the decoder 120 yields a full frame in the middle of a frame cycle (i.e., the transmitted frame buffer has not completed the transmission of its current frame), an additional buffer is needed to prevent stalling of the decoder 120.
The uncompressed frame transmitted from the input triple frame memory 123 is scaled according to the desired output layout by input scaler 127, and then placed onto the video bus 150. The appropriate video ports 130 then retrieve the scaled frame from the video bus 150, using builder 112, based on the layout needed to be generated. The builder 112 collects one or more frames from at least one video port as needed by the layout, and arranges the frames to create a composite output frame. An output scaler 117 then scales the composite frame to a desired resolution and stores the scaled composite frame in an output triple frame memory 115.
The output triple frame memory 115 consists of three frame buffers. Working in a cyclic mode, one buffer is needed for the frame received from the video bus 150. The second buffer is used for the frame being encoded by an encoder 110. When the encoder 110 receives a new frame from the video bus 150 in the middle of a frame cycle (i.e., the encode frame buffer is still busy), the frame is stored in this third buffer to prevent loss of the frame. The encoder 110 then encodes the frame from the output triple frame memory 115, and stores the compressed data in an output buffer 113. The data residing in the output buffer 113 is then transferred to the backplane bus 140, and ultimately to the end user terminal.
In the above description there is a total separation between the encoders and the decoders. The reason for this separation can typically be attributed to using off-the-shelf encoders/decoders, such as an 8×8 VCP processor, which were originally designed for use within end-points. The use of such off the shelf components forces the designer to design the video bus 150 as a video screen for the decoder and as a video camera for the encoder with video signals such as horizontal sync and vertical sync. The decoders output a newly uncompressed frame only when it was completely decoded. The encoders use the scaled frames only after the frame is fully loaded into their memory.
FIG. 2 illustrates resulting delays in a typical video conference where two endpoints are connected to each other through a common VCCU. End-user A 25 transmits at a maximum frame rate of 30 frames per second (“fps”) while end-user B 21 transmits at a maximum of 15 fps. The first decoded MB of each incoming frame waits in the input triple frame memory 123 (FIG. 1) until the whole frame received from the backplane bus 140 (FIG. 1) is fully decoded. This process results in a delay of one input frame. Additionally, the decoded frame is delayed until the start of the next video bus frame cycle before being transmitted along with the rest of the frame to the video bus 150 (FIG. 1), contributing an average delay of half a bus frame. The resulting delay to the decoder can be calculated by the following equation:DecoderDelay=1/InputFrameRate+1/(2*VideoBusFrameRate)
Next, the encoder path of the video port 130 (FIG. 1) retrieves the entire frame residing on the video bus 150, and stores the frame into the output triple frame memory 115 (FIG. 1). Additionally, the first MB is delayed until the encoder 110 (FIG. 1) is ready to start encoding a new frame (27 for user A and 29 for user B). The average delay is therefore:EncoderDelay=1/VideoBusFrameRate+1/(2*OutputFrameRate)
The total delay resulting from video sharing over a dedicated video bus simulating a screen on one side (decoder side) and a camera on the other side (encoder side) is:Delay=1/InputFrameRate+3/(2*VideoBusFrameRate)+1/(2*OutputFrameRate)
Thus, it is evident that current technology utilized in multimedia video conferences results in significant video delays. In low frame rate connections, this delay can amount to a few hundreds milliseconds. This creates a significant degradation in the quality of the conference when considering that the delay is caused at both endpoints, and is typically of the same magnitude. Therefore, there is a need in the art for a method and system for reducing the delay in video transcoding and continuous presence for video conferencing technology.
Prior art systems offer an approach to reduce the delay, however communication is limited. For example, prior art terminals have to use the same standard, or one layout (e.g., Hollywood Square, the screen is divided into four pictures of the same size) with QCIF to CIF Resolution. The present invention overcomes these limitations in that it can operate in several resolutions simultaneously, with any number of participants, as well as other layouts and standards.