The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
Telephone conference (“teleconference”) systems allow multiple users to participate in telephone calls by providing integrated speaker and microphone arrays in desktop telephones. Such systems allow multiple users seated around a table to simultaneously listen and talk to listeners at the other end of the phone line, and can use standard telephone lines or Internet telephony for Voice over Internet (VoIP) applications. Present teleconference and VoIP phones typically contain multiple microphones so that people in different areas of the room have a microphone that is aimed at least somewhat toward them.
The use of multiple microphones or appropriate signal processing technology can be used to derive some measure of source location from the input sound signals. Some present teleconference systems may attempt to retain the positional context of sound sources to provide spatial information associated with a conference call in order to help listeners identify speakers based on spatial location cues. In such systems, techniques such as head-related transfer functions (HRTF) and other similar methods are used to recreate the source soundfield such that sounds that emanate in front of, above, behind or next to the listener if he were located within the room are recreated in the same relative position upon playback to the listener. During face-to-face conversation, however, a listener normally turns to face a talker. Thus, conversational speech is normally received from the front of a listener. In conference call situations that utilize present spatial-aware devices, and in which a listener hears a binaural rendering of the soundfield over headphones or monitors, the listener may find it disturbing if talkers in the soundfield appear to come from the side or behind them, when they more naturally would expect the sound to come from in front of them.
Present teleconference systems also attempt to provide relatively high quality monophonic audio content through each microphone channel by reducing noise through various noise-reduction techniques. The multiple microphone channels are then compressed for transmission over standard telephone or IP (Internet Protocol) networks for playback through a regular telephone at the listening end. Such systems may be fine for certain business and consumer applications where voice content is most important, and the presence of noise and excessive dynamic range may be annoying or distracting. However, such systems effectively limit or even eliminate the true ambient audio environment of the original soundfield and convey a limited sterile representation of only a certain aspect of the entire audio content that may be available.
In summary, traditional phone systems collapse the talker's soundfield environment to a single omni-directional projection and do not allow listeners to focus on a particular talker or deduce context and other useful information based on relative locations of talkers. Systems that attempt to convey spatial information of talkers can create a confusing listening experience by projecting sound at irregular angles to the listener, when he or she would more properly expect to be facing a talker. Such systems also often employ filtering, noise reduction and compression to accentuate spoken content and facilitate transmission over bandwidth limited phone lines. By reducing noise and compressing the signal, these systems also do not faithfully recreate the original soundfield of the talker, thereby resulting in the loss of potentially useful information.
These and other deficiencies are overcome by a soundfield telephony system in which an entire soundfield, potentially including multiple talkers and noise sources with associated directionality, is transmitted for rendering and playback to a listener; and by a telephony system that uses sound source and environmental heuristic information to guide the rotation of a soundfield so that the primary talkers in a conference will be rendered at a desired location in the listener's soundfield.