The present invention relates to telephony, and in particular to an audio conferencing platform.
Audio conferencing platforms are known. For example, see U.S. Pat. Nos. 5,483,588 and 5,495,522. Audio conferencing platforms allow conference participants to easily schedule and conduct audio conferences with a large number of users. In addition, audio conference platforms are generally capable of simultaneously supporting many conferences.
A problem with existing audio conference platforms is that they employ a fixed threshold to determine whether a conference participant is speaking. Using such a fixed threshold may result in a conference participant being added to the summed conference audio, even though they are not speaking. Specifically, if the background audio noise is high (e.g., the user is on a factory floor), then the amount of digitized audio energy associated with that conference participant may be sufficient for the conference platform to falsely detect speech, and add the background noise to the conference sum under the mistaken belief that the energy is associated with speech.
Therefore, there is a need for a system that accounts for background noise in the detection of valid conference speakers.
One object of the present invention is to provide a method and system that advantageously accounts for background noise on lines participating in a conference call and prevents the background noise from being added to the conference sum because an erroneous determination has been made that the energy is associated with speech. Another object is to provide such an advantage dynamically, to account for changing conditions on participating lines.
A preferred embodiment of the invention comprises an audio conferencing platform that includes a time division multiplexing (TDM) data bus, a controller, and an interface circuit that receives audio signals from a plurality of conference participants and provides digitized audio signals in assigned time slots over the data bus. The audio conferencing platform also includes a plurality of digital signal processors (DSPs) adapted to communicate on the TDM bus with the interface circuit. At least one of the DSPs sums a plurality of the digitized audio signals associated with conference participants who are speaking to provide a summed conference signal. This DSP provides the summed conference signal to at least one of the other plurality of DSPs, which removes the digitized audio signal associated with a speaker whose voice is included in the summed conference signal, thus providing a customized conference audio signal to each of the speakers.
Each of the digitized audio signals are processed to determine whether the digitized audio signal includes speech. For each digitized audio signal, the amount of energy associated with the digitized audio signal is compared against a dynamic threshold value associated with the line over which the audio signal is received. The dynamic threshold value is set as a function of background noise within the digitized audio signal.
The audio conferencing platform preferably configures at least one of the DSPs as a centralized audio mixer and at least another one of the DSPs as an audio processor. The centralized audio mixer performs the step of summing a plurality of the digitized audio signals associated with conference participants who are speaking, to provide the summed conference signal. The centralized audio mixer provides the summed conference signal to the audio processor(s) for post processing and routing to the conference participants. The post processing includes removing the audio associated with a speaker from the conference signal to be sent to the speaker. For example, if there are forty conference participants and three of the participants are speaking, then the summed conference signal will include the audio from the three speakers. The summed conference signal is made available on the data bus to the thirty-seven non-speaking conference participants. However, the three speakers each receive an audio signal that is equal to the summed conference signal less the digitized audio signal associated with that speaker. Removing the speaker""s own voice from the audio he hears reduces echoes.
The centralized audio mixer also preferably receives DTMF detect bits indicative of the digitized audio signals that include a DTMF tone. The DTMF detect bits may be provided by another of the DSPs that is programmed to detect DTMF tones. If the digitized audio signal is associated with a speaker, but the digitized audio signal includes a DTMF tone, the centralized conference mixer will not include the digitized audio signal in the summed conference signal while that DTMF detect bit signal is active. This ensures that conference participants do not hear annoying DTMF tones in the conference audio. When the DTMF tone is no longer present in the digitized audio signal, the centralized conference mixer may include the audio signal in the summed conference signal.
The audio conference platform is preferably capable of supporting a number of simultaneous conferences (e.g., 384). As a result, the audio conference mixer provides a summed conference signal for each of the conferences.
Each of the digitized audio signals may be preprocessed. The preprocessing steps include decompressing the signal (e.g., using the well-known xcexc-law or A-law compression schemes), and determining whether the magnitude of the decompressed audio signal is greater than a detection threshold. If it is, then a speech bit associated with the digitized audio signal is set. Otherwise, the speech bit is cleared.
The centralized conference mixer reduces repetitive tasks distributed between the plurality of DSPs. In addition, centralized conference mixing provides a system architecture that is scalable and thus easily expanded.
Advantageously, using a dynamic threshold value to determine whether there is speech on a line helps to ensure that background noise is not falsely detected as speech.
Thus, a method in accordance with a preferred embodiment of the present invention comprises receiving audio signals over a plurality of ports. For at least one port, the method comprises determining a dynamic threshold value based on one or more characteristics of signals received on the port; associating said dynamic threshold value with the port; and comparing one or more characteristics of signals subsequently received on the port to the dynamic threshold value. The method further comprises summing signals received over the plurality of ports, wherein signals received on the at least one port whose characteristics (such as energy level) have a specified relationship to the dynamic threshold value (for example, having an energy level less than the threshold value) are not contained in the sum. The method may further comprise preprocessing audio signals by decompressing them using either xcexc-law or A-law decompression.
In one aspect, the method comprises identifying which ports are receiving audio signals that contain speech; and, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port.
In another aspect, the method comprises identifying which ports are receiving audio signals that contain DTMF tones; and, on each such identified port, transmitting a summed signal, wherein said summed signal does not contain signals received on that port. Preferably, the step of identifying comprises setting a DTMF detect bit for a signal. The method may also comprise the step of including signals from previously identified ports in the sum after those ports are no longer identified as receiving signals containing one or more DTMF tones.
The invention further comprises software and systems for implementing methods described herein.
These and other objects, features, and advantages of the present invention will become apparent in light of the following detailed description of preferred embodiments thereof, as illustrated in the accompanying drawings.
Although the invention has been described in connection with an audio conferencing platform, it is not limited to such a platform and may be used, for example, in a video conferencing system.