Voice over Internet Protocol (VoIP) enables use of the Internet as a transmission medium for telephone calls instead of using the traditional Public Switched Telephone Network (PSTN). VoIP sends voice data in packets using the Internet Protocol (IP). Voice data for each call participant is contained in a voice stream. VoIP is quickly gaining popularity due to the proliferation of broadband connections to homes and the availability of low-cost hardware and software. Despite the rise in popularity, in order to compete with PSTN, VoIP must provide the functionality offered by PSTN, such as multi-party voice conferencing.
Multi-party voice conferencing is a conference between multiple participants in which voice data is transmitted to each participant. The participants often are located at different sites. For a PSTN voice conference, each participant's telephone is connected to a central bridge, which sums all of the voice signals and transmits the voice sum back to the participants. When migrating to VoIP, however, various problems arise with this central bridge-based architecture. A VoIP voice conference transmits the voice data of each participant over a wide-area network (such as the Internet), and each participant is connected to the network using a client. The Internet, however, introduces variable delays and packet losses into the network transmission process. Another problem is that the central bridge-based architecture places a high demand on the central bridge. In particular, the central bridge must decode the clients' voice packets, sum them, compress, and send summed and compressed voice packets back to each client. Because each client requires his own voice to be subtracted from the sum, the packet compression usually has to be done separately for each individual client. Because of these problems, the load on the bridge increases linearly to the number of clients that are connected to the bridge.
In order to reduce the load on the bridge, silence suppression is commonly used. Silence suppression reduces the bridge load by limiting the number of packets sent to the bridge. One way to accomplish this goal is to only send packets when actual speech is detected. This means that the bridge only receives and mixes packets that contain actual voice. Theoretically, therefore, the cost on the bridge is reduced substantially. In practice, however, the net savings from silence suppression techniques depends heavily on external factors such as microphone quality, the microphone's position relative to the user's mouth, the gain of the sound card, and the level and type of background noise. Since many of these factors are not controllable by the bridge, it is forced to reserve a significant amount of resources to deal with the worst case scenario of being flooded by incoming packets. This tends to negate at least some of the cost savings achieved by using silence suppression techniques.
In traditional silence suppression techniques, the client makes a decision whether to send its own voice packet out on the network. This decision is based on the results of a speech/silence test of the client's own audio signal containing voice packets. The speech/silence test examines the client's voice packet and determines the level of voice activity contained therein. The results of this speech/silence test then are compared to a fixed threshold. Based on this examination of its own audio signal, the client determines whether to send its packet.