Data communications are widespread and well known. One example of data communication includes real time communication of digital audio data in discrete packets over data networks. By way of particular examples, real-time phone calls and audio/video conferences may communicate voice and other audio data between numerous participants in real time. Recent advances in remote collaboration have introduced new and ever higher levels of demand on audio data communication. Sessions of 20 or more participants are not uncommon, and conferences of into the hundreds or more occur. Numbers of simultaneous participants are expected to continue to grow as long as technology can keep up.
In sessions of this size, bandwidth and processor demands to communicate all of the data between all of the participants can become problematic. Taking a conference with 20 attendees by way of example, each participant may receive multiple video and audio data streams from each other participant. If each participant in a video conference uses two cameras and two microphones, for example, then each participant will receive 38 individual real-time data streams. Providing sufficient capacity, reliability, and control resource for all of this data can be a time consuming and costly effort. As a result, there is a desire to consolidate communications as is practical and efficient to conserve bandwidth and processor resources.
One known bandwidth consolidation practice is to combine audio streams into a single “master stream.” For example, all of the microphones from all of the attendees may be communicated to a central network bridge, where they are mixed into a single stream for communication out to all of the participants. While this can reduce required bandwidth, it comes at a cost of limiting or denying the ability to control individual audio data streams. If a particular bundled stream includes data from 20 participants, for example, it may not be convenient or possible to adjust the volume of only participant number 18. Often, only the single master stream can be adjusted. This can cause substantial difficulties, in that microphone placement, settings, ambient noise, and other factors can be very different from participant to participant. One participant may be sending an audio data stream that is low in volume and hard to discern, while another sends one that is quite loud. Also, with reference to video and audio conferences in particular, there is a desire to make the remote experience as close to in-person interaction as is possible. Individual volume controls might be useful to add a “spatial element” to known conferences and phone calls.
Proposals have been made to accomplish some degree of individual volume control. For example, it has been proposed for every conference participant to receive an individual audio stream from every other participant. Each participant could then mix the streams as they are received, adding gain or attenuation to each stream as set by the user. The bandwidth requirements associated with practice of this proposal, however, are quite high. The bandwidth for each participant scales linearly (with a high constant) as the number of streams increases. And if a centralized architecture is used for the conference, the bandwidth for the central hub scales geometrically.
As discussed above, to alleviate the bandwidth requirements a mixer (also referred to as an “audio bridge”, “MCU”, or “MP”) has been proposed to combine every audio stream at a central network location so that they can be combined and transmitted to each participant as a single stream. However, adjusting the volume of each individual participant becomes difficult, as a mixer (e.g., a computer) cannot easily recognize who contributed what to the stream. Methods for breaking apart or decomposing audio streams are available, but they are extremely processor-intensive, they lack desirable accuracy, and in general they are not designed for audio conference signals.