This invention relates generally to audio processors and more particularly to audio processors for use in audio or audio/videoconferencing systems.
As is known in the art, one of the key components in an audio or audio/videoconferencing system is the audio processor. The audio processor is responsible for receiving audio from various sites connected to the conference system and for distributing the audio to the various sites.
There are two classic types of audio processors: an audio switch and an audio mixer. With the audio switch, time compressed audio data from one site is sent to none, or to one or more of the other sites. These audio switches are used, for example, with sites that use a "push-to-talk" method and send time compressed audio when there is speech, or some audible signal, which is to be communicated. In some audio switches, the "push-to-talk" at the site is automatic, thereby removing the requirement that the user actually "push" a button. In any event, the audio switch does not actually decode the time compressed audio signal; rather, it simply decides which audio source each site will receive and then routes the time compressed audio to the appropriate site or sites. The switch operation can be based on one, or a combination, of the following: a control protocol which allows the users to request to speak; a control protocol which allows the users to request to hear a particular site; and, a decision mechanism which forwards the audio received from one site to the other sites. Usually the audio switch is configured so that no site will receive the audio it is transmitting.
The audio switch is very efficient to implement because it is not required to decode the time compressed audio signals. Thus, a single conferencing server, or bridge, can support a large number of sites. Further, the audio switch does not degrade the audio signal due to transcoding (that is, decompression followed by compression of the decompressed signals) or other signal processing losses because it simply routes the time compressed audio as the audio switch receives it. Still further, the audio switch has a relatively low time delay because transcoding is not required.
On the other hand, because the audio switch does not decode its received, time compressed audio, but merely passes it to a site, or sites, decisions about how to route the time compressed audio are limited. In particular, because the input to the switch is time compressed audio, the switch cannot use acoustic energy detection techniques in making routing decisions. Further, if there are speaking participants at more than one site, the audio switch must select only one of the sites as the source of the audio to be passed through the switch. To put it another way, the audio at the various sites cannot be mixed because the switch simply routes the time compressed audio from the sites.
An audio mixer operates with non-time compressed, that is, uncompressed, audio. For each site in the conferencing system, the audio mixer combines the audio from selected other sites and re-encodes (that is, time compresses) the combined audio so that it can output time compressed, mixed audio to a receiving site. With a large number of sites, a selector is used to select only a few of the sites, discarding unselected sites, to thereby reduce noise. Because uncompressed audio is available at the selector, the selection can be made based on the relative amount of acoustic energy in the audio received from the sites by the selector. Other signal processing techniques can also be used to make the decision. With an audio mixer, when participants at more than one site are speaking, they can be heard by participants at the other sites. On the other hand, the audio mixer must decompress, mix, and then re-compress the received audio. This three step process degrades the quality of the original audio, an effect which can become particularly objectionable if multiple conferencing servers are cascaded together to serve a single meeting. Further, the signal processing adds delay to the audio propagation. Still further, once the audio signals are mixed they cannot be un-mixed thereby limiting the topologies available for distributed conferencing servers.