In conventional audio conference systems, there exist monophonic listening systems in which it is difficult to distinguish and locate the various speakers in the conference. Other point-to-point audio conference systems transmit the real audio signals picked up by binaural microphones positioned on the head of a dummy. Those microphones serve to pick up the sounds coming from the various speakers in a conference room in distinct manner. The locations of the microphones simulates the locations of the left and right ears of a person. The difference between the sounds picked up by the two microphones makes it possible to obtain location information about the speakers in the room. Such a conference system is described for example in U.S. Pat. No. 7,012,630.
After being digitized, and where necessary compressed, the audio signals as obtained in this way are forwarded to another conference station, which relays them to a pair of loudspeakers or to a plurality of headsets that may be worn by each of the participants.
Natural sound pickup presents the advantage of agreeable listening comfort and a high degree of realism.
Other audio conference systems provide a conference bridge, also known as a multipoint control unit (MPU), that serves to create artificial conference scenes so that the user of a virtual conference room has the impression of having all of the participants in the same room and at different positions therein.
To create this artificial scene, use is made of head-related transfer function (HRTF) filters in order to simulate a position in three dimensions for a speaker. This is described for example in application US 2005/0018039 in which an HRTF filter is defined for each speaker of the conference system. The conference bridge applies a respective HRTF filter for the left ear and for the right ear to a monophonic signal coming from a speaker. A dual-channel signal in binaural format is thus obtained for each speaker.
It is often necessary for a user to intervene in order to propose determined positions for the speakers in three-dimensional space.
In U.S. Pat. No. 7,012,630, such an artificial scene can be created only when there is only one speaker present per conference room. One monophonic signal is then transmitted per conference site so as subsequently to create within the MCU an artificial scene that represents the various speakers at the various sites, which scene is then forwarded to the various sites.
Those artificial scenes as created in that way, even though they make it possible to benefit from spatialized listening, are nevertheless not as realistic as a natural scene of the kind that can be picked up in reality by a binaural sound pickup.
Prior art methods do not make provision for obtaining natural spatialized (or immersive) listening in a multisite and multiuser audio conference. Spatialized conference bridges that exist in the prior art are also incapable of extending the number of participants or the number of sites within the audio conference system over the course of time.
Furthermore, the methods proposed do not make it possible, while listening, simultaneously to locate a participant and to know which conference site the participant belongs to.
The present invention seeks to improve the situation.