The present invention relates to speaker cluster design and rendering, with particular application to teleconferencing systems.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
In a standard teleconferencing system, multiple participants (audio sources) are combined into a single audio output at the destination. For example, the audio from Location 1 and Location 2 is combined and output at Location 3, the audio from Location 1 and Location 3 is combined and output at Location 2, etc.
The general goal of voice communications systems is to create a reasonable facsimile of voice over a distance with a minimal latency or introduction of unpleasant or distracting artefacts. Basic objective quality measures relate to the fidelity and reproduction of the voice signal across the system. One higher level objective measure is that of intelligibility and the extent to which the conversation can be understood.
Improvements in teleconferencing technology have been directed toward solving many problems. There is development in the field around capture, processing, coding, transport and reproduction of voice signals to achieve to achieve high intelligibility.
Another specific problem area relates to background noise. If two or more locations are generating background noise, a risk exists that the teleconferencing system combines the noise into a larger background noise output at the destination. Many developments in teleconferencing technology have been directed toward the removal of background noise from the various sources, as this prevents the signal combination from overamplifying similar background noises. An example of a specific type of solution to background noise is voice activity detection (VAD). The received audio signals from a location are analyzed (e.g., by a teleconferencing server) and classified into voice signals and non-voice signals; the non-voice signals are not provided to the other participants in the teleconference. Alternatively, different types of signal processing may be applied to voice and non-voice signals, e.g., to control the output signal level.
Another problem area relates to feedback. The teleconferencing system needs to ensure that audio from Location 1 that is output at Location 2 is not retransmitted back to Location 1 as part of Location 2's transmission.
Another problem area relates to consistency in volume among teleconferencing locations. Development in this area includes measuring the received audio volume from each location (e.g., by a teleconferencing server) and applying leveling to increase or decrease the audio signals to be output at each teleconferencing location.