Teleconferencing can facilitate group collaboration, and therefore it has become a widely used form of telecommunications over the last several years, particularly as businesses have had to deal with conferences and meetings involving participants from geographically diverse locations. In a typical teleconferencing environment, a plurality of physical locations each involves one or more participants, most often using a single telecommunications device at each such location. Moreover, in locations (which will hereinafter be referred to as “rooms” or “conference rooms”) where there is more than one participant, as well as in some locations where there is a single participant, the telecommunications device is most commonly operated in a “speakerphone” mode, wherein a microphone is used an “input” device for receiving the audio generated within the given room and a loudspeaker is used as an “output” device for broadcasting the audio received from other locations into the given room.
However, because each conference room allows multiple participants to join the conference, and because several participants may speak at the same time, speech acquisition and delivery becomes a very difficult and challenging problem. If each conference room is equipped with a single microphone and loudspeaker, then whenever there are multiple active speakers the speech signals from these different speakers will superimpose together. In such a superimposed signal, one speaker's signal interferes with signals from other active speakers. This will cause serious problems for the listeners who are sitting in remote rooms and trying to understand the speech from some particular, desired speaker. In addition, even if there is only one active speaker at a time, the microphone signal can be corrupted by noise and reverberation, leading to a significant degradation in speech quality and intelligibility.
One way that has been suggested for improving speech acquisition in a teleconferencing environment is with the use of microphone arrays, which are familiar to those of ordinary skill in the art. With the use of microphone arrays, a desired signal may be advantageously extracted from the cacophony of audio sounds using beamforming, or more generally spatiotemporal filtering techniques. Many beamforming techniques have been developed and will be fully familiar to those of ordinary skill in the art, including the more simple delay-and-sum approaches and the more sophisticated filter-and-sum algorithms.
As is fully familiar to those of ordinary skill in the art, the fundamental underlying idea of beamforming is to apply a filter to each microphone output and then sum the filtered microphone signals together to form one output. If each filter is properly designed, beamforming can significantly attenuate background noise, suppress interference from competing sources, and reduce reverberation. Therefore, with the use of a microphone arrays and beamforming techniques, we can separate signals from active speakers based on the mixed microphone observations. However, even though we can, in theory, separate speech from multiple active speakers, existing teleconferencing systems do not provide any method for selectively transmitting the separated signals. They either simply transmit the mixed signal (containing all active speakers) or arbitrarily pick up one active speaker's signal (e.g., the strongest) and send it to the remote locations. This traditional way of handling speech has many drawbacks. First, if the mixed signal is sent to the receiving rooms, the speech will in general have a very low quality and intelligibility because the multiple speakers will most certainly interfere with each other. Second, if the transmitting room arbitrarily isolates a signal from one active speaker (e.g., the loudest), this active speaker may not necessarily be the one that the remote participants want to hear. Moreover, participants located in a remote conference room may not in general be able to identify the current active speaker unless they are familiar with the speaker's voice.