1. Field of the Invention
The present invention relates generally to the field of telecommunications and, more specifically, to a method and apparatus for performing conferencing services and echo suppression.
2. Background Information
Audio conferencing techniques intelligently mix the input speech of multiple parties to produce an accurate output that is then played back to the parties. By way of background, it has been known that for a conference having a small number of participants, the input speech signals (from each participant) are summed to produce an output that is then transmitted as a conference output signal to each of the participants.
For larger conferences, the simple sum approach is not effective due to the noise involved. Specifically, when there are many conference participants, each with standard office noise in the background, the sum of this background noise itself can overwhelm the conference. Thus, it has been known to limit the number of participants whose input speech is summed to form the conference output signal. Typically, a small subset of the total participants, often three, is summed to produce the output. In one solution, the input speech of the actively speaking parties having the highest amplitude (loudest) are selected and summed as the conference output signal. In other cases, the input speech of the greatest energies are selected for inclusion in the conference output signal.
These approaches partially alleviate the noise issue, however, noise continues to be a problem even with the selection of larger amplitude conference participants. This is because one participant may, for example, be the participant with the highest amplitude only because he is driving in a car on a noisy highway and is thus, due to the noise level and amplitude thereof, qualifying as one of the top contenders. This deteriorates the quality of the conference for all participants.
Furthermore, simply selecting the largest amplitude participants of the conference can be difficult when there are participants with greatly different volumes. This can occur due to poor line conditions, faulty telephones or quirks of personality in the sense that some people are softer speakers than others. If a participant with a lower amplitude voice input is attempting to establish himself within the top three contenders, this may not happen in a timely manner and the beginning of his words can be clipped. And, even when added to the conference, it may be difficult for the other participants to hear the soft-spoken participant.
A further issue arises with respect to DTMF (dual tone multi-frequency) signals. As will be understood by those skilled in the art, the familiar DTMF signals that are generated when keys are pressed on the traditional touch tone phone set, are actually comprised of two tones. The two tones consist of two distinct frequencies, a row frequency and a column frequency. It has been known to provide a DTMF detector, which operates such that by determining the row frequency and the column frequency, the DTMF detector identifies the touch-tone that was pressed.
In some conferencing applications, a participant can control his or her individual volume (or other parameter) using the touch-tone signals. For example, the control might be that a participant may press “1” to increase volume, and “2” to decrease volume. However, if a participant uses this feature and presses the keys, the DTMF tone thus produced enters the conference. That tone could then be sent back out to other participants as part of the conference output signal. If there is an echo, the tone will be reflected back and the reflection could then cause the results of the DTMF signals (such as an increase in volume) thus the volume is continuously increased because an echoed DTMF tone is repeatedly amplified and sent back out to the conference, clearly disrupting the conference. In addition, the DTMF tones themselves can be quite loud and can be an annoyance to the participants.
Another problem that occurs in large conferences is that of line echo. In the conferencing setting, an echoed signal can be summed back into the conference output signal and sent back out onto the line. More specifically, an echo is generated whenever a telephone signal is converted from a four-wire connection to a two-wire connection (a standard PSTN connection). This echo is a delayed and attenuated version of the original signal. An echo can make conversation impossible and in a conference, echo can be tremendously disruptive. Most telecommunications networks incorporate echo cancellers to remove echo. However, as networks become more complex and elements such as cell phones and speakerphones are introduced, echo cancellation, in turn, becomes more complex.
A network echo cancellation component creates a model of the telephone line echo. Using the model, the circuit creates a synthetic echo, which is subtracted from the input speech thus canceling the echo signal. This process is continually monitored and adapted. The end result is a relatively echo-free signal being generated. In most robust echo cancellation systems, the echo canceller is followed by an echo suppression or non-linear process to remove or mask any remnants of the echoed speech that may have been missed. This works well, but the disadvantage of this type of echo cancellation technique is that it is computationally quite expensive to develop the synthetic echo that is then subtracted from the signal.
Echo suppression, on the other hand, is a somewhat simpler solution that generally can be a useful technique in areas other than conferencing, for example. Echo suppression determines when a signal qualifies as echo (as opposed to voice) and based upon this determination mutes this input signal when it is expected to be an echo signal. Typically a voice activity detector is used on both the inbound and outbound legs to determine when echo is present. And, if it is present, the signal will be muted. Known echo suppression techniques have not been effective in the large conference environment.
There remains, therefore, a need for a conferencing algorithm that results in input signal selection that includes participants who are actively speaking and not those that are simply loudest due to background noise. There remains a further need for a method and apparatus for performing conferencing for a large number of participants, which has improved noise reduction, and is capable of producing an echo free output signal but yet is computationally cost effective. There remains yet a further need for a method and apparatus for performing conferencing that removes DTMF tone from the input signals.