Transmission of moving pictures in real-time is employed in several applications like e.g. video conferencing, net meetings and video telephony.
Video conferencing systems allow for simultaneous exchange of audio, video and data information among multiple conferencing sites. Systems known as Multipoint Control Units (MCUs) perform switching functions to allow the endpoints of multiple sites to intercommunicate in a conference. An endpoint conventionally refers to a video conference terminal, either a stand-alone terminal equipped with at least a camera, a display, a loudspeaker or a headphone and a processor or a video conferencing software client installed on a general purpose computer with the corresponding capabilities.
The MCU links the sites together by receiving frames of conference signals from the sites, processing the received signals, and retransmitting the processed signals to appropriate sites. The conference signals include audio, video, data and control information. In a switched conference, the video signal from one of the endpoints, typically that of the loudest speaker, is broadcasted to each of the participants. When the different video streams have been mixed together into one single video stream, the composed video stream is transmitted to the different parties of the video conference, where each transmitted video stream preferably follows a set of schemes indicating who will receive which video stream. In general, the different users prefer to receive different video streams. The continuous presence or composite image is a combined picture that may include live video streams, still images, menus or other visual images from participants in the conference. The combined picture may e.g. be composed by several equally sized pictures, or one main picture in addition to one or more smaller pictures in inset windows, commonly referred to as Picture-in-Picture (PIP). PIPs require typically a much lower resolution than the main picture due to the size difference within the screen.
Video MCUs based on a transcoding architecture will use one dedicated encoder per connected participant (video terminal). The advantage is that each participant can have a personalized view of the conference, but more important from a network resilience point of view is the fact that this ensures that if one participant is connecting over a poor network then this does not affect the received video quality experienced by the other participants.
Existing video MCUs that use a shared encoder approach suffer from quality problems if one of the endpoints connected to the shared encoder has a bad network. The particular endpoint may ask the MCU to (1) continuously send complete inter frames (I-frames) in order to “clean up” any received video errors, or (2) ask the MCU to reduce its transmission rate and send a lower resolution or lower frame rate video stream in order to reduce the bandwidth used in the hope of reducing the number of packets lost. Since the encoder in the MCU is shared among several endpoints, then clearly these two issues will degrade the experience for these other endpoints. Thus, there is a need for a method for sharing encoder resources in an MCU without compromising with image quality and bandwidth adaption.