FIG. 1 illustrates a full mesh peer-to-peer (P2P) videoconference that is achieved by setting up independent audio/video real-time RTP streams between each participant 102, 104, 106, 108 of the conference such that each participant 102, 104, 106, 108 transmits one audio/video (and possibly content) stream to each other participant 102, 104, 106, 108 and receives the same from each other participant 102, 104, 106, 108 as shown. A signaling server no coordinates the streams. The main advantage of a full mesh conference by way of comparison to the more traditional centralized bridge conference method is the lower latency of media and the elimination of bottlenecks in the form of centralized media servers. The main disadvantage of full mesh approach is that more bandwidth is required to set up video streams to send and received video from every participant in the conference.
Let us assume that each participant in a full mesh P2P videoconference is sending video at ‘K’ kbps. Then for a conference with ‘N’ participants the amount of uplink bandwidth and downlink bandwidth required at each participant will be K*(N−1). For 512 kbps video and six participants, the bandwidth required will be upwards of 2.5 Mbps in each direction for each participant.
FIG. 2 illustrates one way to mitigate the bandwidth problems of the full mesh is to limit the number of participants transmitting video (so it is no longer “full” mesh) to a relatively small subset. For example, one possible subset could be that only video of the active speaker is visible to all as shown in FIG. 2. In FIG. 2 Participant 1 202 is the active speaker and has streams to each of the other participants 204, 206, 208, with the signaling server 210 controlling the streams. This technique can be extended to include multiple videos from say the two most recent active speakers. Such a policy will limit the number of participants transmitting video and hence the amount of downlink bandwidth required. With video at ‘K’ kbps, ‘N’ total participants and ‘A’ active participants transmitting video, the amount of uplink bandwidth used for video at non-active participants will be zero. The downlink bandwidth at each non-active participant will be K*A and K*(A−1) for active participants. But the uplink bandwidth at each active participant will still be K*(N−1). In cases where there are more active participants than a certain threshold, a secondary selection algorithm can be employed. The obvious choice there is to limit the “active” set by picking the A loudest participants. This is similar to how audio bridges select only two or three audio streams to mix for inactive participants and all other ones are muted.
So as demonstrated above, the mesh approach limits the size of a given conference to a small number of participants depending on available uplink bandwidth to peers.