When one or several people are part of local and remote scenes, this is referred to as videoconferencing, which allows these people to meet remotely.
The telepresence is an extension of the videoconference.
As with a videoconference, the telepresence is obtained by the transmission of image, of sound, as well as data representing text, graphics, diagrams, etc.
Although a scene can take place outside, normally it takes place inside a building. This is why a local room will be designated as the place where the local scene takes place and a remote room will be designated as the place where the remote scene takes place.
The image transmission of a local room SL equipped with an image sensing device 1 to a remote room SD equipped with a recovery or restoring device follows a path that is schematically represented (FIG. 1) and that includes an image sensing device 1 such as a camera, possibly an analogue-digital converter CAN, a coding system C, a transmission network R, a decoding system D, possibly a digital-analogue converter CNA, and an image recovery device 2 such as a projector P connected to a screen, for example a plasma, LCD, CRT . . . screen.
Similarly, there can be a sound pickup system of the local room and a sound recovery system in the remote room. This system therefore includes microphones Mi and speakers Hp as illustrated in FIG. 1.
In order for the communication to be reciprocal, the sound and image systems represented in FIG. 1 are reproduced in the opposite direction to ensure that the image and sounds are captured in the remote room SD and recovered in the local room SL.
Lastly, each room is equipped with an image sensing device and a sound pickup device, as well as an image and sound recovery device.
Then, using the audio-visual system that allows bringing together a local scene and a remote scene, a system located in a room, a public area, or outside and that includes at least one module comprised of an image sensing device and a sound pickup device, as well as an image and sound recovering device connected to a communications network (internal corporate network (local network) or public network) will be designated. This is also referred to as an interactive audio-visual system.
Among the audio-visual interactive systems, there are videoconference systems. These videoconference systems are available in different formats: videoconference room, video-phone, personal computer PC using multimedia communications, interactive base, etc.
Nevertheless, of interest here are also other configurations, for example, kiosks or telepresence walls in a hall, on the street; connected in a quasi-permanent manner to another kiosk or remote telepresence wall. In this case, it is no longer necessary to reserve the service, as is often the case for current videoconference systems.
A person that passes in front of a telepresence wall located, for example, in Paris, can communicate either in “sotto voce” or informally with a remote person passing in front of another telepresence wall located, for example, in LONDON, and connected to the system in Paris, as if they met on the street, in a hallway, etc. These remote persons can, for example, walk “side-by-side.”
To ensure the co-presence, the following must be controlled:
Eye contact,
Person's height (scale 1),
Audio and video quality,
Screen distance,
Modularity for having a configurable image and sound wall.
Before presenting the invention, reminded below will be the usage constraints of audio-visual systems, and, in particular, all the phenomena related to environmental constraints, the effect of lack of eye contact, and the concatenation of several devices.
In general terms, telepresence audio-visual systems are made to be used at a specific distance from the scene, both in terms capture and recovery, depending on the size of the image and the service provided.
Nevertheless, viewing at a close distance is a predominant factor to ensure that the videoconference or telepresence participants are able to comfortably observe and tele-use the system, which ensures the telepresence effect. This viewing at a close distance allows, notably, increasing the sense of closeness between remote participants by favouring eye contact.
Nevertheless, the closer the scene to film is to the camera, the larger the field angle of the camera has to be. This increase of the image sensing angle at a close distance presents a problem illustrated in FIGS. 2a and 2b. 
FIG. 2a schematically represents, viewed from the bottom, an image sensing device 1 located in a local room SL, filming a local scene, represented by local participants PL seated around a table located, for example, approximately 1 meter from the camera that has a wide α angle. The direction of sight of the participants is indicated by the small dash that represents the participant's nose. On a screen E, the image of the remote participants PD is formed, notably, the image of remote participant d.
When in the local room SL a local participant a that is not located in the camera's axis represented by ray b1 speaks to a remote participant, he or she looks at the image d′ of said remote participant on the screen E. Although a is facing d′ according to ray ad′, the camera receives ray a1, and ultimately films the participant a in profile.
It is this profile image that is transmitted to the recovery device located in the remote room SD, which resends to d, as indicated in FIG. 2b, the a′ image of a as if a were not looking at d. The eye contact is not recovered. This effect is called the parallax (eye contact or eye gaze) effect sometimes referred to as “lack of eye contact effect.”
Recall that the image plane is the plane in which the d′ image is located. In this example, it is confused with the screen but this is not always the case when the image is reflected using a mirror.
The ad′ ray is a beam that comes from the local scene to be filmed located in a plane called a target plane and it is perpendicular to the image plane.
If, as indicated in FIG. 2c, the image capture were performed in cylindrical projection mode retained in a descriptive geometry, also called Monge geometry, or even industrial design, allowing the camera to capture all the rays that are parallel to ad′, and not in conical projection mode according to an α angle as represented in FIG. 2a, the lack of eye contact effect is eliminated.
The visual parallax or lack of eye contact effect, more prosaically called the “hyprocrite” effect was presented within the framework of the videoconference but can be generalised by considering as a local scene persons standing up or no longer considering any persons, but rather objects; for example, a cube whose sides are blue or red that is placed in a tilted manner: it presents a red side and a blue side. But ray a1 only comes from a blue side. The a′ image of the cube will only present the blue side instead of presenting both the red and blue sides of the cube placed in a tilted manner.
Furthermore, the size of the participants or objects during recovery varies according to the part of the field in which they are located and according to the camera's α angle. If several participants or objects are to be filmed, the field has to be increased, but the recovered images are somewhat curved inasmuch as the perspective effects are distorted on the edges of the image; the recovered images present a variable enlarged effect illustrated in FIGS. 3a and 3b and is well known by photographers that use wide lenses.
On each of these figures, there are two local participants PL represented, a and b, one set back with respect to the other, and filmed by the image sensing device presenting, in the case of FIG. 3a, an α1 angle field that is smaller than that of α2 in FIG. 3b. The recovered images a′1 and a′2 of a are almost identical in both cases, but the enlargement of b′1 with respect to b (FIG. 3a) is greater than that of b′2 with respect to b (FIG. 3b.)
A solution that allows reducing the disturbing effected caused by an image capture with a wide angle (lack of eye contact effect and distorted perspective) consists in limiting the β angle formed at a point a of the scene with rays ad′ and a1, represented in FIG. 2a. An ETSI, European Telecommunications Standards Institute, specification specifies that this angle β should not exceed 5 degrees. This angle is obtained by limiting the filmed scene by truncating or placing the camera around the viewing axis, which disturbs viewing.
To compensate this inconvenience, there are several image sensing devices 1 available such as those represented in FIGS. 4a and 4b. To ensure the continuity of the image capture, these devices, whose optical axes are radial and within the same plane, are adjacent: several devices 1 are “concatenated”. But then, the image sensing fields are common or overlap areas ZR and the recovered images on one or several adjacent recovery devices (each corresponding to an image sensing device), will present these discontinuities from the duplicated parts or overlaps. Recall that image overlap is the multiple reproduction of certain parts of the same scene captured by different cameras whose fields overlap more or less partially.
In the case of two image sensing devices 1 represented by FIG. 4a, there will an area with two overlaps ZR; in the case of three image sensing devices 1 represented in FIG. 4b, there will be areas with two overlaps when the object is close to the image sensing device, with three overlaps when it is further away, etc.
This image overlap phenomenon increases when the field angle of the image sensing devices increases.
Image processing software has been developed to solve this problem, but it still does not provide satisfactory results.