Existing audio conferencing systems often encounter various problems. Some of these include bandwidth problems, mixer capacity (i.e. inability to handle a large number of endpoints), and voice quality issues. For example, geographically spread out participant streams have to be brought to the centralized mixer location before mixing in and the same mixed content may need to be sent back to individual legs. This consumes high amounts of bandwidth as the number of participants increases. Also, this may result in numerous voice quality problems related to long distance transmission (e.g., level imbalances, long echo, etc.).
Moreover, not all participants may be talking at any particular point in time during the audio conference. Existing systems mix unwanted streams, the mixed speech may get clipped, unwanted background noise may get mixed in, etc. This may result in a waste of the processing power on the mixer as unwanted streams are mixed, thereby limiting the number of maximum participants and/or endpoints in an audio conference.
Further, and with regard to voice quality, background noise from listeners and active talkers may be mixed into the audio conference which causes fatigue and prevents the user from focusing on the conversation. Moreover, acoustic echo reflections from hybrid elements and end points may also make it difficult for a user to follow the conversation. With devices such as mobile phones, desk phones and computers as end points there may be unbalanced speech levels from various talkers. The end points may or may not handle all scenarios and the ones without any enhancement are bound to inject impairments.