A multi-party audio communication system enables a group of people to engage in a real-time audio communication session. In addition, the system allows multiple people to be speaking at the same time. Besides the audio components of the two-party audio communication system (such as audio capture, acoustic echo cancellation (AEC), automatic gain control (AGC), and audio/speech compression), the multi-party audio communication system poses unique challenges in audio mixing and network delivery.
By way of example, assume that n number of peer computers (or peers) are engaged in a multi-party audio communication session, with possible multiple concurrent speakers. Further assume that each stream of audio requires a bandwidth of bw. The multi-party audio communication system may be formed with a variety of topologies and mixing strategies. One popular topology is a star topology, as shown in FIG. 1A. A powerful central server, S, receives audio streams from all peers (t1, t2, t3, t4, and t5), mixes the audio streams, and sends the mixed and re-encoded audio back to all peers.
The advantage of the star topology is that each peer uses the same hardware as that of a two-party communication system, and thus needs no modification. Only the server needs to be redesigned to support a multi-party communication session. Consequently, the star topology is a popular choice for commercial multi-party communication solutions. One such system is set forth in a paper by K. Singh, G. Nair, and H. Schulzrinne entitled “Centralized Conferencing using SIP” in Proceedings of the 2nd IP Telephony Workshop, April 2001. The main shortcoming of the start topology is that a heavy computation and bandwidth burden is placed on the server, S. The server, S, needs to receive n streams of compressed audio (with n·bw download bandwidth), decode, mix and re-encode them, and send the mixed audio back to n peers (n·bw upload bandwidth).
A second common topology is a fully connected unicast network, as shown in FIG. 1B. In a fully connected network, every peer is connected to every other peer in the network. An example of this type of topology is discussed in a paper by J. Lennox and H. Schulzrinne entitled “A protocol for reliable decentralized conferencing” in Proceedings of the 13th international workshop on network and operating systems support for digital audio and video, (NOSS-DAV'2003), pp. 72-81, 2003, Monterey, Calif. In this topology, the peers (t1, t2, t3, t4, and t5) do not perform any audio mixing or redelivery. Instead, each speaker simply sends the compressed audio to every other peer. In such a topology, each peer needs (n−1)·bw upload bandwidth to send the audio to the rest of the peer, and a maximum of (n−1)·bw download bandwidth to receive the incoming audio. One disadvantage of this topology is the large increase in network traffic, which places a large burden on each peer and the entire network.
A third possible topology is a generic graph that uses end system mixing. An example of this type of topology is shown in FIG. 1C and in a paper by M. Radenkovic, C. Greenhalgh, and S. Benford entitled “Deployment issues for multi-user audio support in CVEs” in Proceedings ACM Symposium on virtual reality software and technology, pp. 179-185, 2002, Hong Kong, China. As shown in FIG. 1C, in this example peers a, b, f and g are leaf nodes, and do not perform any mixing operations. The peers c, d and e serve as a gateway node, which mixes and redelivers the audio for the nearby peers. In general, a gateway node with m neighbors requires m·bw upload and download bandwidth to receive and redeliver the audio. Since m is usually much smaller than n, the design of this topology scales well to a large conferencing session. Nevertheless, the disadvantage of this topology is that the burden on the gateway node can be heavy. Another disadvantage is that as the chain of gateways becomes long, the latency in audio delivery increases. Yet another disadvantage is that the audio may also lose synchronization along the chain of delivery.
A network level solution to further reduce the traffic in an audio communication session is through IP multicast. In IP multicast, a single packet that is transmitted from a source is duplicated at routers along a distribution tree rooted at the source. In this manner, content is delivered to an arbitrary number of receivers. For example, in the star topology shown in FIG. 1A, a peer may still send the compressed audio to the server via unicast. However, the server, S, can multicast the mixed and re-encoded audio back to n peers. A sample implementation of such system can be found in a paper entitled “ConferenceXP: wireless classrooms, collaboration and distance learning”. The upload bandwidth of the server, S, is reduced to bw.
One disadvantage, however, of IP multicast is that the requirement on the download bandwidth of the server remains unchanged at n·bw. In the fully connected network shown in FIG. 1B, each speaker may also multicast the compressed audio to every other peer in the network. Again, the disadvantage of the IP multicast for the fully connected network is that while the upload bandwidth of the peer is reduced to bw, the download bandwidth of the peer remain unchanged at (n−1)·bw. Another disadvantage of IP multicast is that its deployment is slow in the real world because of issues such as inter-domain routing protocols, ISP business models (charging models), congestion control along the distribution tree, and security, among other things. As a result, except certain limited university/corporate subnet and network test bed (such as Internet2), native IP multicast support is not widely available. Because of these problems in deploying a network-level multicast service, the vast majority of traffic in the Internet today is unicast based, whereby two computers directly talk to each other.
One type of system and method for one-to-many content distribution for file transfer over a P2P network is described in co-pending patent application U.S. Ser. No. 10/887,406 entitled “Efficient One-to-Many Content Distribution in a Peer-to-Peer Computer Network” by J. Li, P. Chou, and C. Zhang, filed on Jul. 7, 2004. However, that work involved one-to-many file transfer and distribution, whereas an audio communication session involves many-to-many distribution. In addition, that work made extensive use of a TCP/IP queue. However, using a queue is impractical for audio conferencing, because the packets must arrive in a timely manner. Moreover, audio from different sources may be mixed, which makes audio delivery unique in the audio communication applications.
One disadvantage of existing multi-party audio communication systems is that the mixing and redelivery role played by the peer or server is fixed by the network topology. Another disadvantage of existing audio communication systems is that they perform mixing entire audio streams. Therefore, what is needed is an audio communication system and method that makes the most efficient use of network resources. Moreover, what is needed is a system and method that avoid the disadvantages of the above-described network topologies and is flexible in the mixing and redelivery roles played by the peers. Further, what is needed is an audio communication system and method that performs mixing on frames of audio streams rather than the entire audio streams. Moreover, what is needed is an audio communication system and method that avoid the use of a queue and overcomes the delay problems of file transfer techniques.