Video conferencing permits people to meet without traveling. Unfortunately the video conferencing experience can be unsatisfactory when compared to a real meeting room with the same people. In a multiparty video call there are multiple video images (usually at least one from each end-point). If there are more images than video monitors, the images will be rendered at a small scale or viewed one at a time by some multiplexing technique. It is often difficult to view individual images at full scale and quickly sort between multiple images.
It is also often difficult to tell what the people at the remote videoconference site are looking at. If multiple sites are connected in a single call (a multiparty video call) it is not clear who is talking to whom.
Multi-party videoconferences are typically hosted by a Multiparty Control Unit (MCU) also known as a Multipoint Conference Unit. The MCU consolidates all the video feeds from the end points and presents appropriate video images to each of the end points of all the video camera and collaboration material video feeds in the conference.
MCUs typically permit limited independent control of what images are promoted to full-scale size. Common techniques for full-scale selection are to permit ‘paging’ between individual images or to enable automatic promotion of single images or image-combinations using audio or manual cues.
MCU's do not have a sense of physical location or proximity of materials in the conference. No physical layout information is preserved or inferred in presentation. FIG. 3a illustrates a typical MCU screen layouts. In the simplest case the screen 31 is simply divided into equal sized panes 32 with no weight given to any remote endpoint video.
Alternatively, as shown in FIG. 3b Video A of the current talker at is rendered in the largest “pane” 33, other participant Videos B, C and D are rendered in the smaller panes 34. A moment later a participant at location B picks up the conversation and the MCU switches the panes so that Video B is rendered in large pane 33 and Video A is rendered in a smaller pane 34. This is shown in FIG. 3c. Typical MCUs offer a number of layout designs with different numbers of panes but all are characterized by lack of an overall consistent positioning of video panes relative to one another. Video pane labels to some degree help participants follow the conversation especially if video source switching is also employed at some distant endpoints. One pane, often the smallest, displays the local video as other endpoints will see it.
Radvision SCOPIA Elite 5000 MCU is typical of current technology. By default all endpoints receive the same video stream. The layout of this stream is setup by a moderator. Users at any endpoint may setup a “personal conference layout” but doing this is a multistep procedure involving a number of dialog boxes.
The term “Telepresence” is used to refer to a videoconference system having certain characteristics addressing the general problem of making a videoconference more like a face-to-face meeting. High definition video renders participants life-size on large screens, arranged typically in a row and often borderless, wideband stereo or multichannel sound are used to create a lifelike impression of distant parties with the objective of giving the illusion that they are actually sitting across the table. Various methods are used to address the problem which arises when there are more video streams to show than there are monitors to show them.
Cisco TelePresence Multipoint Switch (CTMS) is a typical MCU for use in a telepresence environment. In order to satisfy conflicting requirements of displaying all meeting participants life-size on a limited number of monitors voice activated switching is used. Switching may be at a site level, i.e. the site(s) with the current and most resent talkers are displayed. In the case of a site(s) with more than one or two participants multiple cameras may be “segment” switched making the current or most recent talker visible at distant endpoints. As is typical in such system the local user interface that is used to control layout, e.g. place a “presentation-in-picture”, or alternatively on a separate monitor, is controlled via dialog boxes on a control device, e.g. laptop PC. The essence of the problem with multiparty conferences is that video cameras render a three-dimensional world on a two-dimensional video monitor. There has been some research in rendering spatially appropriate video images; see references 1 and 2 below.
In a multipoint conference employing an MCU endpoints are interconnected in a star, or multi-star, topology. In an alternative configuration, known as mesh configuration and illustrated in FIG. 2, each endpoint is connected directly to each other endpoint in the conference. In such an arrangement each endpoint has complete freedom to present video and other collaboration material uniquely and independently of the way it is presented at other endpoints.
In the past mesh endpoints have been built on standard GUI frameworks. This allows users extreme flexibility in that they can move, shape, minimize, maximize, bring forward, send back, cascade, etc. windows each representing video or collaboration material from other endpoints in a call. However, all this flexibility is both tedious and distracting in the context of a meeting.
Various approaches have been considered in the prior art:—See, for example, “Multiview: improving trust in group video conferencing through spatial faithfulness”, Nguyen, D. T., Canny, J.; ACM conference on Human Factors in Computer Systems, 2007, pp. 1465-1474; Berkeley Institute of Design; and “eyeView: focus+context views for large group video conferences”; Jenkin, T., McGeachie, J., Fono, D., Vertegaal; Proceedings of the ACM Conference on Human Factors in Computer Systems (CHI), 2006 (extended abstracts), pp. 1497-1500; Human Media Lab of Queens' University Canada.
The Heinrich Hertz Institute of the Fraunhofer Institute for Telecommunications is also working on 3D video conferencing technologies. A few companies are using emulations of 3D environments like Tixeosoft.
DARPA Technical Report “DDI/IT 83-4-314.73”, Linda B. Allardyce and L. Scott Randall, April 1983.
The DARPA report describes how “realism in conferee relationship is accomplished in two ways. First, at each station, the four conferees (one real and three surrogates) must maintain the same arrangement; that is A is always on B's left, B is always on C's left, and D is always on A's left. . . . The second key. . . . Instead of a single camera transmitting the same image of the real conferee to all other locations, there is an individual camera for each surrogate transmitting the image of the present conferee to the remote station from the surrogate's perspective . . . ”