Videoconferencing entails exchange of audio, video, and other information between at least two participants. Generally, a videoconferencing endpoint at each participant location will include a camera for capturing images of the local participant and a display device for displaying images of remote participants. The videoconferencing endpoint can also include additional display devices for displaying digital content. In scenarios where more than two endpoints participate in a videoconferencing session, a multipoint control unit (MCU) can be used as a conference controlling entity. The MCU and endpoints typically communicate over a communication network, the MCU receiving and transmitting video, audio, and data channels from and to the endpoints.
Telepresence technologies provide enhanced videoconferencing experience to participants so that the near end participants feel as if they are present in the same room as the far end participants. Telepresence videoconferencing can be provided for various conferencing systems, ranging from two person point-to-point videoconferencing systems to multi-participant multipoint videoconferencing systems. Typically, telepresence utilizes multiple cameras to capture images of near end participants and multiple displays to display images of far end participants. Multiple video streams are transmitted from multiple endpoints to the MCU to be combined into one or more combined video streams that are sent back to the endpoints to be displayed on multiple display devices. For example, in a telepresence system involving three endpoints, each endpoint having three cameras, the MCU will receive nine video streams. The MCU will have to combine the nine received video streams into one or more combined video streams, which are sent back to be displayed on the display devices at each endpoint. These nine video streams will have to be laid out for each endpoint based on the number and type of displays at each endpoint. Furthermore, although the MCU may receive the information from the endpoint that the current speaker is located at that endpoint, with more than one video stream being received from each endpoint the MCU may not be able to determine which one of the multiple video streams includes the current speaker. Thus, dynamically selecting one of many video streams received from an endpoint for prominent display may be difficult.
Commonly-owned U.S. Pat. No. 8,537,195, which is hereby incorporated by reference in its entirety, describes various techniques for assigning telepresence streams to a display layout. However, even some embodiments of such systems may not utilize all of the available screens to show the active speaker and other participants in a mixed interactive telepresence (“ITP”) call environment. Additionally, with current layout management tools, multi-screen environment administrators have a high upfront management task to coordinate layouts for end user environment scenarios and these often fail to meet the desired speaker switching needs for the end users. For example, many current active speaker switching embodiments prioritize sites in a call based on number of camera streams, which does not always factor in the active speaker or other key meeting analytics to optimize the user experience with automated layouts. This leads to scenarios where the active speaker may not be shown at all on screens at a particular location. Another undesirable scenario that can arise in multi-screen environments is when active speaker locations move around so much that users are disoriented and unsure of where to focus.
In some currently available embodiments, conference rooms with multiple monitors may locate the main speaker on a single monitor, usually in the center, with other participants being shown in a filmstrip view at the bottom. Various embodiments of a film strip arrangement, including dynamic assignment of users to the various view positions, are described in Provisional U.S. Patent Application 62/002,561, filed May 23, 2014 and entitled, “Method And System For New Layout Experience In Video Communication,” which is hereby incorporated by reference in its entirety.
In some variations of such an arrangement, if the speaker is a single camera site and the conference room viewing the speaker has three monitors, the speaker might show up full screen on the center monitor, while other participants would show on the left and right monitors as film strips at the bottom of a mostly black screen.
Other conventional videoconferencing arrangements reposition video streams based on the location of the current speaker. These arrangements can be unnecessarily jarring to viewers, especially when endpoints are utilizing different numbers of cameras and outputting different numbers of video streams.
Therefore, in order to overcome this problem arising in the realm of video conferencing, there is a need for rule-based systems for controlling video layouts in multi-site, multi-camera videoconferencing.