1. Field of the Invention
The present invention relates generally to data transfer and particularly to a method for background noise reduction and performance improvement in voice conferencing over packetized networks.
2. Description of the Related Art
Conference calling, such as a conference by telephone and other like audio and/or visual device in which three or more persons in different locations participate by means of a central switching unit, enables participants in widely dispersed geographical areas to communicate in an efficient manner in real time. Because of the great utility provided by conference calls, the use of this method of communication has made its way into many aspects of modem life, connecting home users, wireless users, business personnel, and the like, to enable multiple users the ability to communicate with each other at the same time. In this way, a group of people may communicate directly without requiring the participants to physically travel to the same location. However, a conference call may encounter a large quantity of background noise thereby reducing the quality and utility of the conference call.
Therefore, when mixing voice streams from multiple participants in a conference call, it is desirable to reduce background noise within the conference call as well as reduce computational resource requirements required in providing the call. Previous methods utilized to correct for background noise involved outputting to each participant the gain corrected sum of all voices, outputting to each participant the gain corrected sum of the voices of all other participants and outputting only the loudest speaker to each participant.
While outputting to each participant the gain corrected sum of all voices may be acceptable in circuit switched networks, in which delays are low and participants can not hear their own voice due to compensation by the human communication channel and brain of the participant, such a method is not feasible in a packetized network. For instance, in an environment where voice is transported over a packet network, the delay may be larger, so that participants may be able to hear their own voice, recognized as a disturbing echo. Such an echo is typically too strong to be removed utilizing normal echo cancellation, and further, requires extensive resources, as such removal may be computationally expensive as the echo tail may be quite long, such as greater than 60–160 ms.
Outputting to each participant the gain corrected sum of the voices of all other participants adds in addition to the voice of active participants background noise for “silent” participants. Thus, as the number of participants increase, the background noise from “silent” participants also increases, thereby lowering the quality of the communication. Additionally, this technique is computationally expensive, since it may be necessary to perform a time add of (n−1) voices for each participant, n being the number of participants.
Further, outputting only the loudest speaker to each participant generally suffers from insufficient voice quality. For example, in conference calls with high interactivity, switchovers between participants may be disturbing to the participants. During a switchover between loudest participants, information from one participant may be lost, thereby affecting the continuity of the call and the overall experience. Moreover, situations may be encountered within the call in which more than one speaker may wish to speak at the same time. In such a situation, one of the inputs would not be provided to the other participants, and the originating participant may not even know if the output was transmitted.