1. Field of the Invention
The present invention relates in general to the field of video conferencing and telepresence systems. More specifically, the invention relates to a method, a device, and a computer-readable medium for processing images during a conference between a plurality of video conferencing terminals.
2. Description of the Related Art
Conventional videoconferencing systems include a number of endpoints that communicate real-time video, audio, and/or data (often referred to as “duo video”) streams over and between various networks such as WAN, LAN, and circuit switched networks.
A number of videoconference systems residing at different sites may participate in the same conference, most often, through one or more Multipoint Control Units (MCU) performing, among other things, switching rate conversion, and transcoding functions to allow the audiovisual terminals to intercommunicate properly. The MCU also allows for aggregate presentation on one display of several end users located at different endpoints.
A compression of multimedia data to be transmitted, as well as a decompression of the multimedia data to be received, takes place in a processor unit conventionally referred to as a “codec” (coder/decoder).
Video conferencing systems presently provide communication between at least two locations for allowing a video conference among participants situated at endpoints at each location. Conventionally, the video conferencing arrangements are provided with one or more cameras. The outputs of those cameras are transmitted along with audio signals to a corresponding plurality of displays at a second location such that the participants at the first location are perceived to be present, or face-to-face, with participants at the second location.
Telepresence systems are enhanced video conference systems. Typically, terminals in telepresence systems have a plurality of large scale displays for life-sized video, often installed in rooms with interiors dedicated to video conferencing, all to create an environment as close to personal face-to-face meetings as possible. Video cameras are often arranged on top of the display screens in order to capture images of the local participants, and are transmitted to corresponding remote video conference sites. The images captured by the plurality of high-definition cameras are usually arranged and displayed so that they generate a non-overlapping and/or contiguous field of view. This is in contrast to traditional so-called “Continuous presence” where the video streams are mixed (e.g., a mosaic) in an MCU from source images at endpoints and displayed together on one display in a screen split (N*M array).
Key factors in achieving a feeling of presence are the ability to see at whom the remote participants are looking, that all the participants are displayed in real life size, and that all displayed participants appear equally sized relative to each other. Another provision for achieving high quality telepresence is that the images of the remote participants are presented to each local participant as undistorted as possible.
In order to obtain this feeling of presence, a set of rules, or a proprietary protocol, is used by the telepresence systems such as that described in U.S. patent application Ser. No. 12/050,004. That set of rules (or protocol) defines e.g. camera positions (pan, tilt zoom), codec connection scheme (which local codec should call which remote codec), etc. In known telepresence systems, the user dials (or selects from a phonebook) the remote telepresence sites (and/or other video endpoints) he/she wishes to join in the conference. When the call is launched, the system decides how and where the different remote sites are displayed on the local displays. This may, for example, depend on call sequence (e.g., in a four-site multi-site call the first called site is displayed on the left screen, the second called site on center screen, and the third called site on right screen), or it may appear to be totally random.
FIG. 1 is a schematic view illustrating some aspects of conventional telepresence video conferencing.
A display device of a video conferencing device, in particular a video conferencing terminal of the telepresence type, is arranged in front of a plurality (in this case, four) of local conference participants. The local participants are located along a table facing the display device, which includes a plurality of display screens. In the illustrated example, four display screens are included in the display device. A first, second, and third display screens are arranged adjacent to each other, as shown in FIG. 1. The first, second, and third display screens are used for displaying images captured at one or more remote conference sites of a corresponding telepresence type.
A fourth display screen is arranged at a central position below the second display screen, as shown in FIG. 1. Generally, the fourth screen may be used for computer-generated presentations or other secondary conference information. Video cameras are arranged on top of the upper display screens in order to capture images of the local participants. The images are then transmitted to corresponding remote video conferencing sites.
A purpose of the setup shown in FIG. 1 is to give the local participants a feeling of actually being present in the same meeting room as the remote participants that are shown on the respective display screens. As mentioned above, in order to obtain the feeling of presence, a set of rules, or a proprietary protocol, is used by the telepresence systems. Therefore, a conventional telepresence system, such as the one shown in FIG. 1, will operate properly only with other telepresence systems supporting that set of rules or protocol. Further, since a standard protocol for telepresence systems has not been defined, only telepresence systems from the same manufacturer may interoperate in a satisfactory way. The present inventors have recognized this as a problem with the conventional system.
In many situations there is a need to call, or receive a call from, a regular video conferencing terminal from a telepresence system, even though the regular video conferencing terminal does not provide the same video and audio quality, real life size display, or eye contact capability. One way of solving this problem is to use a conventional video conferencing codec to handle calls with regular (non-telepresence) video conferencing terminals, and including these calls in the conference by use of the fourth display. As shown in FIG. 1, this fourth screen may be positioned below the second screen. Alternatively, it may be positioned above or next to the collection of displays constituting the telepresence system.
One problem with displaying participants on such a fourth screen is that the feeling of presence is generally lost, or at least reduced. For instance, when the participants at one telepresence site are looking at the fourth screen, it will appear to the participants at the other telepresence sites as if they are looking into the table surface, the ceiling, or the wall. Thus, there is a need for improving the feeling of presence and eye contact in telepresence systems when regular (non-telepresence) video conferencing terminals are included in the conference.
As recognized by the present inventors, another disadvantage of the additional (e.g., fourth) screen is that the conferencing system occupies a significantly larger space in the conference room. Moreover, as the display size of the three main displays increases, there may not be much physical space for an additional screen.