Voice over Internet Protocol (VoIP) enables use of the Internet as a transmission medium for telephone calls instead of using the traditional Public Switched Telephone Network (PSTN). VoIP sends voice data in packets using the Internet Protocol (IP). Voice data for each call participant is contained in a voice stream. VoIP is quickly gaining popularity due to the proliferation of broadband connections to homes and the availability of low-cost hardware and software. Despite the rise in popularity, in order to compete with PSTN, VoIP must provide the functionality offered by PSTN, such as multi-party voice conferencing.
Multi-party voice conferencing is a conference between multiple participants in which voice data is transmitted to each participant. The participants often are located at different sites. For a PSTN voice conference, each participant's telephone is connected to a central bridge, which mixes and sums all of the voice signals and transmits the voice sum back to the participants. When migrating to VoIP, it is natural to try and emulate this topology in the digital domain. However, various problems arise with this client-bridge topology. Using the client-bridge topology in a VoIP voice conference, the voice data of each participant is transmitted over a wide-area network (such as the Internet), and each participant is connected to the network using a client. The Internet, however, introduces variable delays and packet losses into the network transmission process. Another problem is that the client-bridge topology places a high demand on the bridge. In particular, the bridge must decode the clients' voice packets, sum them, compress, and send summed and compressed voice packets back to each client. Because each client requires his own voice to be subtracted from the sum, the packet compression usually has to be done separately for each individual client. Because of these problems, the load on the bridge increases linearly to the number of clients that are connected to the bridge. This type of topology puts the scaling burden onto the central bridge, but the requirement for CPU processing power and bandwidth on each client is low.
A simple way to avoid the above-mentioned problems with the client-bridge topology is to connect each of the clients together. This is known as the full-mesh topology. However, one of the main problems with the full-mesh topology is that it does not scale well. In fact, the number of connections goes up as O(N2). Thus, although the full-mesh topology works well for a small number of participants, scaling to a larger number of participants is not good.
In order to be able to scale well and avoid the degradation of the client-bridge topology, another type of topology recently introduced is the tandem-free operation (TFO). Tandem refers to the double encoding that is performed on the packets. TFO operates by sending a packet to the bridge, and, instead of decoding, adding, and sending the packets back to the listener, the packets are merely forwarded by the bridge. Thus, the bridge becomes a simple forwarding device.
One problem, however, with the TFO topology is that the bridge is forced to reserve a significant amount of resources to deal with the worst case scenario of being flooded by incoming packets. This occurs because many external factors, such as microphone quality, the microphone's position relative to the user's mouth, the gain of the sound card, the level and type of background noise or simply many people start to talk at the same time, are out of the control of the bridge. This need to hold so much bandwidth in reserve to deal with the fluctuation in the number of incoming packets tends to negate at least some of the cost savings and other advantages of adopting the TFO topology.