The present invention relates to a videoconferencing or teleconferencing system between participants located at remote and generally distant sites.
It applies to multpoint systems, i.e. to systems designed to produce an audiovisual link between participants at several sites. It also relates to point-to-point systems, i.e. systems that connect only two sites via an audiovisual link. It also applies to visiophones connected via a conference gateway.
The equipment of a videoconferencing system for a given site is located in a facility referred to hereafter as room or studio. In most cases, the participants are seated around a conference table facing a viewing screen.
Today, commercially available videoconferencing services and systems propose multipoint links of limited audiovisual quality due either to the quality channel banks (sound and image retrieval) or, intrinsically, to the fact that there is no bitrate because of a limit to the passband on the particular network used.
Indeed, the conferencing systems are connected to digital networks, mainly on the ISDN network, according to several configurations whether in point-to-point mode or in multipoint mode.
Consequently, the bitrate offered for such a service on the NUMERIS network varies between 128 kbit/s for a bottom-of-the-range conferencing service to 384 kbit/s for a top-of-the-range conferencing service.
The terminals used by and large comply with ITU standards such as all H320 standards.
In certain systems, only one distant room may be seen at a time. This inconveniences users who are unable to see everyone at the same time. A manual or automatic switching selects the room which is projected on the screen. In general, this room transmits the strongest audio signal (voice switching). This is so with conferencing gateways which switch the image according to voice detection, i.e. in the room with the most active sound.
Other systems compliant with ITU criteria such as standards H320 or H323 or other standards use a central unit to receive images from all Multi Conferences Units (MCU), as laid down in standards H231 and H243. This unit retrieves images by sharing the passband for transmission to a MCU as many times as the number of images transmitted.
This is achieved by encryption which substantially compresses the online bitrate at a compression factor of between 40 and 50. This results in definition loss equivalent to about three quarters of the image transmitted by each room (multipoint up to 5 rooms).
The digital compression of the image may be of the type H 320 with encryption H 261, or of the type H 323 with encryption H 263, of the moving type JPEG (ISO standard), of the type MPEG1 or, finally, of the type MPEG2.
In any one of the above profiles, the image sent may be in the form of a computer file.
In all cases, the images received are impaired and in no way reproduce the effect of multiconferencing.
For further information, refer to the closest state-of-the-art technology as described in document D1 WO 94 16517.
For audio transmission, encryption systems are used or else a bitrate compression of the type G722 or G711 which does not respect the original quality of the voice signal, whether in terms of the bandwidth or in terms of the quality of the encryption itself.
The videoconferencing system according to the present invention aims to offer videoconferencing between remote sites with a maximum number of participants by exceeding the limits of today""s commercially available systems. It provides a view of participants at distant sites on screens (at a scale close to 1), in which textures and patterns of behaviour are clearly perceived and in which visual and sound images match thanks to the spatially distributed sound of distant participants dependent upon the imaging conditions.
The proposed system continuously shows the participants at distant sites, even in a multipoint configuration. Moreover, the system minimises eye contact defects by using n video cameras close to the images to be filmed (for example, housed in screen windows), as illustrated subsequently. It also enables high-fidelity sound reproduction.
More specifically, the present invention proposes a videoconferencing system according to claim 1.
The image of the distant participants is at a scale close to 1 on the studio""s screen; this scale depends on the distance between the screen and the table at which the participants are seated. For example, this results in a screen with standard dimensions of approximately 5xc3x972 m to view 4 remote rooms with about 4 people in each room.
The videoconferencing system comprises one or more cameras (CA1), (CA2) and sound recording systems (microphones or acoustical antennas). The sound recording data is matched with one or the other of the cameras depending on whether the video signals transmitted originate from camera (CA1) or camera (CA2).
More specifically, this matching provides a xe2x80x9csubjective overlayxe2x80x9d, the sound source being close to (along the axis of) the associated image.
According to another feature of the invention, the sound recording and retrieval equipment comprises:
a signal capturing and digitisation unit,
a signal retrieval and analog digital conversion unit;
n microphones distributed in front of the participants at the said site;
p loudspeakers distributed along the length of the screen, where P is proportional to the size of the screen;
matching units between one or more microphones, the signal issued from the said microphone(s) and loudspeaker(s) of the remote sites intended to retrieve the said signals
network adaptation devices featuring bitrate reduction;
echo control devices.
The sound is spatially distributed so as to match the sound and visual images. This layout not only strongly enhances the effect of teleconferencing but also enables several conversations to be conducted in parallel between the two remote rooms; monitoring of conversations is simplified by the system""s ability to focus on the person one wants to listen to, just as in a normal meeting.
The devices that establish correspondence between the microphone, the signal originating from the said microphone and the loudspeakers of the remote sites intended to retrieve the said signal operate by programming the desired configuration. Such programming may involve memorising one or more pre-determined configuration(s).
According to another feature, the image recorders comprise q cameras for each site, this number preferably exceeding or being equal to 2. The cameras are positioned in front of the participants of the said site so as to be laid out in distinct areas revealing the various remote participants on the screen or near these areas. In practice, the cameras may be laid out below, on top of or around the screen.
Each room may therefore comprise several cameras which film the participants at different angles. The images which are transmitted to a site are those recorded by the camera located near the image projected for this site. This makes it possible to reduce the eye contact effect, to differentiate viewpoints and reconstitute the location of each participant according to his/her position at the site and within the overall configuration.
In a point-to-point configuration where several images are juxtaposed across the width of the screen, each image provides a partial view of the facility. This makes it easier to know who is watching whom on the screen and therefore to follow the dynamics of the meeting.
In practice, the cameras are positioned under (or below) the images projected on the screen 10.
For a configuration with two concatenated images (one beside the other), the cameras are realigned and are located at a distance from the screen centre corresponding to about one third of the width of the image, i.e. one sixth of the total width of the screen. Such realignment minimises the problem of overlapping at the edges of the two images filmed by the two cameras.
The sites are linked together via a high bitrate network in a point-to-point or multipoint configuration, or via a central MCU unit (videoconferencing gateway).