Videoconferencing entails exchange of audio, video, and other information between at least two participants. Generally, a videoconferencing endpoint at each participant location will include a camera for capturing images of the local participant and a display device for displaying images of remote participants. The videoconferencing endpoint can also include additional display devices for displaying digital content. In scenarios where more than two endpoints participate in a videoconferencing session, a multipoint control unit (MCU) can be used as a conference controlling entity. The MCU and endpoints typically communicate over a communication network, the MCU receiving and transmitting video, audio, and data channels from and to the endpoints.
Telepresence technologies provide enhanced videoconferencing experience to participants so that the near end participants feel as if they are present in the same room as the far end participants. Telepresence videoconferencing can be provided for various conferencing systems, ranging from two person point-to-point videoconferencing systems to multi-participant multipoint videoconferencing systems. Typically, telepresence utilizes multiple cameras to capture images of near end participants and multiple displays to display images of far end participants. Multiple video streams are transmitted from multiple endpoints to the MCU to be combined into one ore more combined video streams that are sent back to the endpoints to be displayed on multiple display devices. For example, in a telepresence system involving three endpoints, each endpoint having three cameras, the MCU will receive nine video streams. The MCU will have to combine the nine received video streams into one or more combined video streams, which are sent back to be displayed on the display devices at each endpoint. These nine video streams will have to be laid out for each endpoint based on the number and type of displays at each endpoint. Furthermore, although the MCU may receive the information from the endpoint that the current speaker is located at that endpoint, with more than one video stream being received from each endpoint the MCU may not be able to determine which one of the multiple video streams includes the current speaker. Thus, dynamically selecting one of many video streams received from an endpoint for prominent display may be difficult.
Traditionally, for multi-point and multi-stream videoconferencing systems, arrangement of video streams to be sent to each endpoint is carried out manually. For example, video network operation centers, also known as VNOC, offer manual management of telepresence videoconferencing that includes appropriate layout of incoming video streams into combined outgoing video streams. The person managing the videoconference at the VNOC monitors the video streams for current speakers, and then manually arranges the layout so that the video stream having the current speaker is prominently displayed on the display screens at each endpoint. Prominently displaying the current speaker's image may involve manipulating the scale and size of the displayed video stream. Again, the person managing the videoconference would manually carry out the scaling and sizing procedure. However, the manual management of VNOC can be plagued by human errors and delays. Additionally, employing a human operator along with providing the required specialized training for operating equipment can be very costly.
In summary, traditional approaches are plagued by static layout arrangements of video streams or the necessity to use error prone manual control if dynamic layout arrangement is desired.