1. Field of the Invention
The present invention relates to video conferencing systems, methods, and computer program product having telepresence features.
2. Description of the Related Art
Conventional videoconferencing systems include a number of end-points that communicate real-time video, audio and/or data (often referred to as “duo video”) streams over and between various networks such as WAN, LAN and circuit switched networks.
A number of videoconference systems residing at different sites may participate in a same conference, most often, through one or more MCU's (Multipoint Control Unit) performing, among other things, switching and mixing functions to allow the audiovisual terminals to intercommunicate properly. The MCU also allows for aggregate presentation on one display of several end users located at different end-points.
However, representing moving pictures requires bulk information, as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and can not practically be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Real time video transmission often requires a large extent of data compression, which may compromise with picture quality. The compression of multimedia data to be transmitted, as well as the decompression of the multimedia data to be received, takes place in a processor unit conventionally referred to as a “codec” (coder/decoder).
As videoconferencing involves various resources and equipment simultaneously interoperating at different places with varying capabilities, there is also a need for the possibility to manage the resources involved both for scheduled and ad hoc videoconferences through a video conference manager tool.
Video conferencing systems presently provide communication between at least two locations for allowing a video conference among participants situated at each station. Conventionally, the video conferencing arrangements are provided with one or more cameras. The outputs of those cameras are transmitted along with audio signals to a corresponding plurality of displays at a second location such that the participants at the first location are perceived to be present, or face-to-face, with participants at the second location.
Further, the images captured by the plurality of cameras must be arranged and displayed so that they generate a non-overlapping and/or contiguous field of view, so-called “continuous presence”. Continuous presence is a mixed picture created from far-end sites in an MCU. For example, in case of a videoconference of five participants, each site will receive and display a picture divided into four quadrants with the picture components captured from each of the other sites inserted in the respective quadrants. Thus, in continuous presence, the area of the video screen gets further sub-divided as more participants are added to the conference. As such, the amount of screen area devoted to a particular participant becomes incrementally smaller as the number of participants increases.
Continuous presence, or several displays with only one camera, prevents the feeling of eye-contact among participants in video conferencing systems. Typically, a camera is placed somewhere above the display at which a participant is observing a display of the participant from the remote station. As recognized by the present inventors, the camera captures the participant at an angle above and on the side of the participant's viewing level or head. Thus, when an image of that participant is displayed at the remote station, it appears as if the participant is looking down or to the left or right. Previous solutions to this problem have required complex optical systems and methods using, for example, a plurality of lenses and mirrors. The solutions have usually been designed for use when the camera is capturing an image of a single participant, and they fall short when simultaneously capturing images of multiple participants.
In addition to the lack of sufficient eye-contact, there are also other limitations in conventional videoconferencing limiting the feeling of being in the same room. Continuous presence and small displays also limits the size of the displayed participants. Low capturing and display resolution and highly compressed data also contribute to a reduction of the experience of presence. Some solutions have tried to improve this by introducing so-called “telepresence systems” requiring dedicated high bandwidth communication lines. However, these solutions are not well suited to be connected to a conventional LAN or WLAN, and are not interoperable with conventional videoconferencing systems.
The eye-contact issue, and the feeling of participants from different sites being present in the same room is not fully resolved, as conventional systems capture the same picture and send the same to all the sites making the movements of the participants look unnatural when they turn to a certain display to talk to the participants displayed therein. Furthermore, with these telepresence systems, there is no conventional mechanism for interconnecting different telepresence sites that are located on different networks. Moreover, firewall traversal limits the ability to seamlessly establish connections between different telepresence sites. Thus conventional telepresence systems have been restricted to dedicated, high-bandwidth communication lines. Furthermore, conventional “telepresence” systems end up being stand-alone systems that are not well integrated with other computer resources and video conference resources within a particular company's facilities. Users of these telepresence systems are handicapped by having relatively limited amount of flexibility in adding other non-telepresence systems endpoints, and establishing calls between telepresence endpoints and other non-telepresence endpoints.