In many telecommunication systems of today, different kinds of teleconference services are offered. In a teleconference session, more than two parties can simultaneously participate and exchange information in any direction. A general struggle is to provide a communication situation that is as close as possible to a real world meeting.
In a real word meeting, the participant are usually located around a table and when someone talks, the other participants usually turn their heads towards the talker both in order to look at the talker as well as to maximize the correlation of the speech reaching respective ear. This maximizes the signal to noise ratio. When more than one person talk at the same time, the human hearing system is able to use the spatial distribution of the sound and separate the speech from the different sources and concentrate the hearing to a specific person if desirable. This phenomenon is known as the cocktail party effect.
In most commonly used teleconference systems, however, mono microphones capture the speech from different parties at different locations and add the signals together before they are sent back to the participating parties and played through loudspeakers or headphones. Persons listening to this will have problems with deciding which person talks and if several persons speak at the same time, it will be hard to separate the speech from the different talkers. The origins of the different sounds all appear to have the same spatial position, i.e. the position of the loudspeakers.
Adding video to the teleconference makes it easier to see which one is talking, but the problem when several persons talk simultaneously remains. The common solution to this in prior art is three-dimensional positional audio, which enables users to perceive sounds in a similar manner as in real meetings, i.e. hearing the direction and distance to a sound source. When three-dimensional (3D) audio is used properly in a teleconference, a virtual room can be reproduced with all parties or participants located at different positions.
The straightforward solution for positioning the participants in a virtual 3D audio teleconference is to put them evenly spread around a round table, as is common in the real world. The speech signal of respective talker is then 3D rendered in order to simulate the relative positions of the talkers with respect to a listener. The relative position of a certain participant will be different for all other participants, but the absolute position will be the same just as the case is for a real worlds meeting.
Positioning virtual persons around a table reflects a real conference meeting well in most aspects, except for that a listener is usually not able to turn its head towards the talker in a virtual teleconference. This will result in that the participants next to the listener will be heard far to the side. Such a situation does not resemble a real conference and is therefore experienced as unpleasant.
An obvious solution is of course to enable that a head turning of the listener influences the 3D rendering, as in real meetings. This, however, requires that head turning parameters must be sent to the teleconference renderer. To that end the listener must be active and turn the virtual head whenever a new participant starts to talk. Concentrating on turning the virtual head would probably steal the concentration from what the persons actually are saying during the meeting. Another solution would be to provide measures of a true head direction automatically and provide such coordinates to the teleconference renderer. However, such equipment then has to incorporate advanced positioning equipment.